Skip to content

Commit e3d5a31

Browse files
authoredMay 31, 2021
docs: rename tendermint-core to system (#6515)
1 parent c0fcc5f commit e3d5a31

File tree

9 files changed

+398
-418
lines changed

9 files changed

+398
-418
lines changed
 

‎docs/introduction/install.md

+10-2
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,14 @@ order: 3
88

99
To download pre-built binaries, see the [releases page](https://github.com/tendermint/tendermint/releases).
1010

11+
## Using Homebrew
12+
13+
You can also install the Tendermint binary by simply using homebrew,
14+
15+
```
16+
brew install tendermint
17+
```
18+
1119
## From Source
1220

1321
You'll need `go` [installed](https://golang.org/doc/install) and the required
@@ -18,14 +26,14 @@ echo export GOPATH=\"\$HOME/go\" >> ~/.bash_profile
1826
echo export PATH=\"\$PATH:\$GOPATH/bin\" >> ~/.bash_profile
1927
```
2028

21-
### Get Source Code
29+
Get the source code:
2230

2331
```sh
2432
git clone https://github.com/tendermint/tendermint.git
2533
cd tendermint
2634
```
2735

28-
### Compile
36+
Then run:
2937

3038
```sh
3139
make install

‎docs/introduction/quick-start.md

+2-21
Original file line numberDiff line numberDiff line change
@@ -7,27 +7,8 @@ order: 2
77
## Overview
88

99
This is a quick start guide. If you have a vague idea about how Tendermint
10-
works and want to get started right away, continue.
11-
12-
## Install
13-
14-
### Quick Install
15-
16-
To quickly get Tendermint installed on a fresh
17-
Ubuntu 16.04 machine, use [this script](https://git.io/fFfOR).
18-
19-
> :warning: Do not copy scripts to run on your machine without knowing what they do.
20-
21-
```sh
22-
curl -L https://git.io/fFfOR | bash
23-
source ~/.profile
24-
```
25-
26-
The script is also used to facilitate cluster deployment below.
27-
28-
### Manual Install
29-
30-
For manual installation, see the [install instructions](install.md)
10+
works and want to get started right away, continue. Make sure you've installed the binary.
11+
Check out [install](./install.md) if you haven't.
3112

3213
## Initialization
3314

‎docs/networks/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
order: 1
33
parent:
44
title: Networks
5-
order: 5
5+
order: 6
66
---
77

88
# Overview

‎docs/nodes/README.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,17 @@ parent:
55
order: 4
66
---
77

8+
# Overview
9+
810
This section will focus on how to operate full nodes, validators and light clients.
911

1012
- [Node Types](#node-types)
1113
- [Configuration](./configuration.md)
12-
- [Configure State sync](./state_sync.md)
14+
- [Configure State sync](./state-sync.md)
1315
- [Validator Guides](./validators.md)
16+
- [Running in Production](./running-in-production.md)
1417
- [How to secure your keys](./validators.md#validator_keys)
18+
- [Remote Signer](./remote-signer.md)
1519
- [Light Client guides](./light-client.md)
1620
- [How to sync a light client](./light-client.md#)
1721
- [Metrics](./metrics.md)
File renamed without changes.

‎docs/nodes/running-in-production.md

+374
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,374 @@
1+
---
2+
order: 4
3+
---
4+
5+
# Running in production
6+
7+
If you are building Tendermint from source for use in production, make sure to check out an appropriate Git tag instead of a branch.
8+
9+
## Database
10+
11+
By default, Tendermint uses the `syndtr/goleveldb` package for its in-process
12+
key-value database. If you want maximal performance, it may be best to install
13+
the real C-implementation of LevelDB and compile Tendermint to use that using
14+
`make build TENDERMINT_BUILD_OPTIONS=cleveldb`. See the [install
15+
instructions](../introduction/install.md) for details.
16+
17+
Tendermint keeps multiple distinct databases in the `$TMROOT/data`:
18+
19+
- `blockstore.db`: Keeps the entire blockchain - stores blocks,
20+
block commits, and block meta data, each indexed by height. Used to sync new
21+
peers.
22+
- `evidence.db`: Stores all verified evidence of misbehaviour.
23+
- `state.db`: Stores the current blockchain state (ie. height, validators,
24+
consensus params). Only grows if consensus params or validators change. Also
25+
used to temporarily store intermediate results during block processing.
26+
- `tx_index.db`: Indexes txs (and their results) by tx hash and by DeliverTx result events.
27+
28+
By default, Tendermint will only index txs by their hash and height, not by their DeliverTx
29+
result events. See [indexing transactions](../app-dev/indexing-transactions.md) for
30+
details.
31+
32+
Applications can expose block pruning strategies to the node operator. Please read the documentation of your application
33+
to find out more details.
34+
35+
Applications can use [state sync](state-sync.md) to help nodes bootstrap quickly.
36+
37+
## Logging
38+
39+
Default logging level (`log-level = "main:info,state:info,statesync:info,*:error"`) should suffice for
40+
normal operation mode. Read [this
41+
post](https://blog.cosmos.network/one-of-the-exciting-new-features-in-0-10-0-release-is-smart-log-level-flag-e2506b4ab756)
42+
for details on how to configure `log-level` config variable. Some of the
43+
modules can be found [here](../nodes/logging#list-of-modules). If
44+
you're trying to debug Tendermint or asked to provide logs with debug
45+
logging level, you can do so by running Tendermint with
46+
`--log-level="*:debug"`.
47+
48+
### Consensus WAL
49+
50+
Tendermint uses a write ahead log (WAL) for consensus. The `consensus.wal` is used to ensure we can recover from a crash at any point
51+
in the consensus state machine. It writes all consensus messages (timeouts, proposals, block part, or vote)
52+
to a single file, flushing to disk before processing messages from its own
53+
validator. Since Tendermint validators are expected to never sign a conflicting vote, the
54+
WAL ensures we can always recover deterministically to the latest state of the consensus without
55+
using the network or re-signing any consensus messages. The consensus WAL max size of 1GB and is automatically rotated.
56+
57+
If your `consensus.wal` is corrupted, see [below](#wal-corruption).
58+
59+
## DOS Exposure and Mitigation
60+
61+
Validators are supposed to setup [Sentry Node
62+
Architecture](./validators.md)
63+
to prevent Denial-of-service attacks.
64+
65+
### P2P
66+
67+
The core of the Tendermint peer-to-peer system is `MConnection`. Each
68+
connection has `MaxPacketMsgPayloadSize`, which is the maximum packet
69+
size and bounded send & receive queues. One can impose restrictions on
70+
send & receive rate per connection (`SendRate`, `RecvRate`).
71+
72+
The number of open P2P connections can become quite large, and hit the operating system's open
73+
file limit (since TCP connections are considered files on UNIX-based systems). Nodes should be
74+
given a sizable open file limit, e.g. 8192, via `ulimit -n 8192` or other deployment-specific
75+
mechanisms.
76+
77+
### RPC
78+
79+
Endpoints returning multiple entries are limited by default to return 30
80+
elements (100 max). See the [RPC Documentation](https://docs.tendermint.com/master/rpc/)
81+
for more information.
82+
83+
Rate-limiting and authentication are another key aspects to help protect
84+
against DOS attacks. Validators are supposed to use external tools like
85+
[NGINX](https://www.nginx.com/blog/rate-limiting-nginx/) or
86+
[traefik](https://docs.traefik.io/middlewares/ratelimit/)
87+
to achieve the same things.
88+
89+
## Debugging Tendermint
90+
91+
If you ever have to debug Tendermint, the first thing you should probably do is
92+
check out the logs. See [Logging](../nodes/logging.md), where we
93+
explain what certain log statements mean.
94+
95+
If, after skimming through the logs, things are not clear still, the next thing
96+
to try is querying the `/status` RPC endpoint. It provides the necessary info:
97+
whenever the node is syncing or not, what height it is on, etc.
98+
99+
```bash
100+
curl http(s)://{ip}:{rpcPort}/status
101+
```
102+
103+
`/dump_consensus_state` will give you a detailed overview of the consensus
104+
state (proposer, latest validators, peers states). From it, you should be able
105+
to figure out why, for example, the network had halted.
106+
107+
```bash
108+
curl http(s)://{ip}:{rpcPort}/dump_consensus_state
109+
```
110+
111+
There is a reduced version of this endpoint - `/consensus_state`, which returns
112+
just the votes seen at the current height.
113+
114+
If, after consulting with the logs and above endpoints, you still have no idea
115+
what's happening, consider using `tendermint debug kill` sub-command. This
116+
command will scrap all the available info and kill the process. See
117+
[Debugging](../tools/debugging.md) for the exact format.
118+
119+
You can inspect the resulting archive yourself or create an issue on
120+
[Github](https://github.com/tendermint/tendermint). Before opening an issue
121+
however, be sure to check if there's [no existing
122+
issue](https://github.com/tendermint/tendermint/issues) already.
123+
124+
## Monitoring Tendermint
125+
126+
Each Tendermint instance has a standard `/health` RPC endpoint, which responds
127+
with 200 (OK) if everything is fine and 500 (or no response) - if something is
128+
wrong.
129+
130+
Other useful endpoints include mentioned earlier `/status`, `/net_info` and
131+
`/validators`.
132+
133+
Tendermint also can report and serve Prometheus metrics. See
134+
[Metrics](./metrics.md).
135+
136+
`tendermint debug dump` sub-command can be used to periodically dump useful
137+
information into an archive. See [Debugging](../tools/debugging.md) for more
138+
information.
139+
140+
## What happens when my app dies
141+
142+
You are supposed to run Tendermint under a [process
143+
supervisor](https://en.wikipedia.org/wiki/Process_supervision) (like
144+
systemd or runit). It will ensure Tendermint is always running (despite
145+
possible errors).
146+
147+
Getting back to the original question, if your application dies,
148+
Tendermint will panic. After a process supervisor restarts your
149+
application, Tendermint should be able to reconnect successfully. The
150+
order of restart does not matter for it.
151+
152+
## Signal handling
153+
154+
We catch SIGINT and SIGTERM and try to clean up nicely. For other
155+
signals we use the default behavior in Go: [Default behavior of signals
156+
in Go
157+
programs](https://golang.org/pkg/os/signal/#hdr-Default_behavior_of_signals_in_Go_programs).
158+
159+
## Corruption
160+
161+
**NOTE:** Make sure you have a backup of the Tendermint data directory.
162+
163+
### Possible causes
164+
165+
Remember that most corruption is caused by hardware issues:
166+
167+
- RAID controllers with faulty / worn out battery backup, and an unexpected power loss
168+
- Hard disk drives with write-back cache enabled, and an unexpected power loss
169+
- Cheap SSDs with insufficient power-loss protection, and an unexpected power-loss
170+
- Defective RAM
171+
- Defective or overheating CPU(s)
172+
173+
Other causes can be:
174+
175+
- Database systems configured with fsync=off and an OS crash or power loss
176+
- Filesystems configured to use write barriers plus a storage layer that ignores write barriers. LVM is a particular culprit.
177+
- Tendermint bugs
178+
- Operating system bugs
179+
- Admin error (e.g., directly modifying Tendermint data-directory contents)
180+
181+
(Source: <https://wiki.postgresql.org/wiki/Corruption>)
182+
183+
### WAL Corruption
184+
185+
If consensus WAL is corrupted at the latest height and you are trying to start
186+
Tendermint, replay will fail with panic.
187+
188+
Recovering from data corruption can be hard and time-consuming. Here are two approaches you can take:
189+
190+
1. Delete the WAL file and restart Tendermint. It will attempt to sync with other peers.
191+
2. Try to repair the WAL file manually:
192+
193+
1) Create a backup of the corrupted WAL file:
194+
195+
```sh
196+
cp "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal_backup
197+
```
198+
199+
2) Use `./scripts/wal2json` to create a human-readable version:
200+
201+
```sh
202+
./scripts/wal2json/wal2json "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal
203+
```
204+
205+
3) Search for a "CORRUPTED MESSAGE" line.
206+
4) By looking at the previous message and the message after the corrupted one
207+
and looking at the logs, try to rebuild the message. If the consequent
208+
messages are marked as corrupted too (this may happen if length header
209+
got corrupted or some writes did not make it to the WAL ~ truncation),
210+
then remove all the lines starting from the corrupted one and restart
211+
Tendermint.
212+
213+
```sh
214+
$EDITOR /tmp/corrupted_wal
215+
```
216+
217+
5) After editing, convert this file back into binary form by running:
218+
219+
```sh
220+
./scripts/json2wal/json2wal /tmp/corrupted_wal $TMHOME/data/cs.wal/wal
221+
```
222+
223+
## Hardware
224+
225+
### Processor and Memory
226+
227+
While actual specs vary depending on the load and validators count, minimal
228+
requirements are:
229+
230+
- 1GB RAM
231+
- 25GB of disk space
232+
- 1.4 GHz CPU
233+
234+
SSD disks are preferable for applications with high transaction throughput.
235+
236+
Recommended:
237+
238+
- 2GB RAM
239+
- 100GB SSD
240+
- x64 2.0 GHz 2v CPU
241+
242+
While for now, Tendermint stores all the history and it may require significant
243+
disk space over time, we are planning to implement state syncing (See [this
244+
issue](https://github.com/tendermint/tendermint/issues/828)). So, storing all
245+
the past blocks will not be necessary.
246+
247+
### Validator signing on 32 bit architectures (or ARM)
248+
249+
Both our `ed25519` and `secp256k1` implementations require constant time
250+
`uint64` multiplication. Non-constant time crypto can (and has) leaked
251+
private keys on both `ed25519` and `secp256k1`. This doesn't exist in hardware
252+
on 32 bit x86 platforms ([source](https://bearssl.org/ctmul.html)), and it
253+
depends on the compiler to enforce that it is constant time. It's unclear at
254+
this point whenever the Golang compiler does this correctly for all
255+
implementations.
256+
257+
**We do not support nor recommend running a validator on 32 bit architectures OR
258+
the "VIA Nano 2000 Series", and the architectures in the ARM section rated
259+
"S-".**
260+
261+
### Operating Systems
262+
263+
Tendermint can be compiled for a wide range of operating systems thanks to Go
264+
language (the list of \$OS/\$ARCH pairs can be found
265+
[here](https://golang.org/doc/install/source#environment)).
266+
267+
While we do not favor any operation system, more secure and stable Linux server
268+
distributions (like Centos) should be preferred over desktop operation systems
269+
(like Mac OS).
270+
271+
### Miscellaneous
272+
273+
NOTE: if you are going to use Tendermint in a public domain, make sure
274+
you read [hardware recommendations](https://cosmos.network/validators) for a validator in the
275+
Cosmos network.
276+
277+
## Configuration parameters
278+
279+
- `p2p.flush-throttle-timeout`
280+
- `p2p.max-packet-msg-payload-size`
281+
- `p2p.send-rate`
282+
- `p2p.recv-rate`
283+
284+
If you are going to use Tendermint in a private domain and you have a
285+
private high-speed network among your peers, it makes sense to lower
286+
flush throttle timeout and increase other params.
287+
288+
```toml
289+
[p2p]
290+
send-rate=20000000 # 2MB/s
291+
recv-rate=20000000 # 2MB/s
292+
flush-throttle-timeout=10
293+
max-packet-msg-payload-size=10240 # 10KB
294+
```
295+
296+
- `mempool.recheck`
297+
298+
After every block, Tendermint rechecks every transaction left in the
299+
mempool to see if transactions committed in that block affected the
300+
application state, so some of the transactions left may become invalid.
301+
If that does not apply to your application, you can disable it by
302+
setting `mempool.recheck=false`.
303+
304+
- `mempool.broadcast`
305+
306+
Setting this to false will stop the mempool from relaying transactions
307+
to other peers until they are included in a block. It means only the
308+
peer you send the tx to will see it until it is included in a block.
309+
310+
- `consensus.skip-timeout-commit`
311+
312+
We want `skip-timeout-commit=false` when there is economics on the line
313+
because proposers should wait to hear for more votes. But if you don't
314+
care about that and want the fastest consensus, you can skip it. It will
315+
be kept false by default for public deployments (e.g. [Cosmos
316+
Hub](https://cosmos.network/intro/hub)) while for enterprise
317+
applications, setting it to true is not a problem.
318+
319+
- `consensus.peer-gossip-sleep-duration`
320+
321+
You can try to reduce the time your node sleeps before checking if
322+
theres something to send its peers.
323+
324+
- `consensus.timeout-commit`
325+
326+
You can also try lowering `timeout-commit` (time we sleep before
327+
proposing the next block).
328+
329+
- `p2p.addr-book-strict`
330+
331+
By default, Tendermint checks whenever a peer's address is routable before
332+
saving it to the address book. The address is considered as routable if the IP
333+
is [valid and within allowed
334+
ranges](https://github.com/tendermint/tendermint/blob/27bd1deabe4ba6a2d9b463b8f3e3f1e31b993e61/p2p/netaddress.go#L209).
335+
336+
This may not be the case for private or local networks, where your IP range is usually
337+
strictly limited and private. If that case, you need to set `addr-book-strict`
338+
to `false` (turn it off).
339+
340+
- `rpc.max-open-connections`
341+
342+
By default, the number of simultaneous connections is limited because most OS
343+
give you limited number of file descriptors.
344+
345+
If you want to accept greater number of connections, you will need to increase
346+
these limits.
347+
348+
[Sysctls to tune the system to be able to open more connections](https://github.com/satori-com/tcpkali/blob/master/doc/tcpkali.man.md#sysctls-to-tune-the-system-to-be-able-to-open-more-connections)
349+
350+
The process file limits must also be increased, e.g. via `ulimit -n 8192`.
351+
352+
...for N connections, such as 50k:
353+
354+
```md
355+
kern.maxfiles=10000+2*N # BSD
356+
kern.maxfilesperproc=100+2*N # BSD
357+
kern.ipc.maxsockets=10000+2*N # BSD
358+
fs.file-max=10000+2*N # Linux
359+
net.ipv4.tcp_max_orphans=N # Linux
360+
361+
# For load-generating clients.
362+
net.ipv4.ip_local_port_range="10000 65535" # Linux.
363+
net.inet.ip.portrange.first=10000 # BSD/Mac.
364+
net.inet.ip.portrange.last=65535 # (Enough for N < 55535)
365+
net.ipv4.tcp_tw_reuse=1 # Linux
366+
net.inet.tcp.maxtcptw=2*N # BSD
367+
368+
# If using netfilter on Linux:
369+
net.netfilter.nf_conntrack_max=N
370+
echo $((N/8)) > /sys/module/nf_conntrack/parameters/hashsize
371+
```
372+
373+
The similar option exists for limiting the number of gRPC connections -
374+
`rpc.grpc-max-open-connections`.

‎docs/tendermint-core/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
order: 1
33
parent:
4-
title: Tendermint Core
5-
order: 4
4+
title: System
5+
order: 5
66
---
77

88
# Overview
+3-390
Original file line numberDiff line numberDiff line change
@@ -1,394 +1,7 @@
11
---
2-
order: 4
2+
order: false
33
---
44

5-
# Running in production
5+
# Running In Production
66

7-
If you are building Tendermint from source for use in production, make sure to check out an appropriate Git tag instead of a branch.
8-
9-
## Database
10-
11-
By default, Tendermint uses the `syndtr/goleveldb` package for its in-process
12-
key-value database. If you want maximal performance, it may be best to install
13-
the real C-implementation of LevelDB and compile Tendermint to use that using
14-
`make build TENDERMINT_BUILD_OPTIONS=cleveldb`. See the [install
15-
instructions](../introduction/install.md) for details.
16-
17-
Tendermint keeps multiple distinct databases in the `$TMROOT/data`:
18-
19-
- `blockstore.db`: Keeps the entire blockchain - stores blocks,
20-
block commits, and block meta data, each indexed by height. Used to sync new
21-
peers.
22-
- `evidence.db`: Stores all verified evidence of misbehaviour.
23-
- `state.db`: Stores the current blockchain state (ie. height, validators,
24-
consensus params). Only grows if consensus params or validators change. Also
25-
used to temporarily store intermediate results during block processing.
26-
- `tx_index.db`: Indexes txs (and their results) by tx hash and by DeliverTx result events.
27-
28-
By default, Tendermint will only index txs by their hash and height, not by their DeliverTx
29-
result events. See [indexing transactions](../app-dev/indexing-transactions.md) for
30-
details.
31-
32-
Applications can expose block pruning strategies to the node operator. Please read the documentation of your application
33-
to find out more details.
34-
35-
Applications can use [state sync](state-sync.md) to help nodes bootstrap quickly.
36-
37-
## Logging
38-
39-
Default logging level (`log-level = "main:info,state:info,statesync:info,*:error"`) should suffice for
40-
normal operation mode. Read [this
41-
post](https://blog.cosmos.network/one-of-the-exciting-new-features-in-0-10-0-release-is-smart-log-level-flag-e2506b4ab756)
42-
for details on how to configure `log-level` config variable. Some of the
43-
modules can be found [here](../nodes/logging#list-of-modules). If
44-
you're trying to debug Tendermint or asked to provide logs with debug
45-
logging level, you can do so by running Tendermint with
46-
`--log-level="*:debug"`.
47-
48-
## Write Ahead Logs (WAL)
49-
50-
Tendermint uses write ahead logs for the consensus (`cs.wal`) and the mempool
51-
(`mempool.wal`). Both WALs have a max size of 1GB and are automatically rotated.
52-
53-
### Consensus WAL
54-
55-
The `consensus.wal` is used to ensure we can recover from a crash at any point
56-
in the consensus state machine.
57-
It writes all consensus messages (timeouts, proposals, block part, or vote)
58-
to a single file, flushing to disk before processing messages from its own
59-
validator. Since Tendermint validators are expected to never sign a conflicting vote, the
60-
WAL ensures we can always recover deterministically to the latest state of the consensus without
61-
using the network or re-signing any consensus messages.
62-
63-
If your `consensus.wal` is corrupted, see [below](#wal-corruption).
64-
65-
### Mempool WAL
66-
67-
The `mempool.wal` logs all incoming txs before running CheckTx, but is
68-
otherwise not used in any programmatic way. It's just a kind of manual
69-
safe guard. Note the mempool provides no durability guarantees - a tx sent to one or many nodes
70-
may never make it into the blockchain if those nodes crash before being able to
71-
propose it. Clients must monitor their txs by subscribing over websockets,
72-
polling for them, or using `/broadcast_tx_commit`. In the worst case, txs can be
73-
resent from the mempool WAL manually.
74-
75-
For the above reasons, the `mempool.wal` is disabled by default. To enable, set
76-
`mempool.wal-dir` to where you want the WAL to be located (e.g.
77-
`data/mempool.wal`).
78-
79-
## DOS Exposure and Mitigation
80-
81-
Validators are supposed to setup [Sentry Node
82-
Architecture](./validators.md)
83-
to prevent Denial-of-service attacks.
84-
85-
### P2P
86-
87-
The core of the Tendermint peer-to-peer system is `MConnection`. Each
88-
connection has `MaxPacketMsgPayloadSize`, which is the maximum packet
89-
size and bounded send & receive queues. One can impose restrictions on
90-
send & receive rate per connection (`SendRate`, `RecvRate`).
91-
92-
The number of open P2P connections can become quite large, and hit the operating system's open
93-
file limit (since TCP connections are considered files on UNIX-based systems). Nodes should be
94-
given a sizable open file limit, e.g. 8192, via `ulimit -n 8192` or other deployment-specific
95-
mechanisms.
96-
97-
### RPC
98-
99-
Endpoints returning multiple entries are limited by default to return 30
100-
elements (100 max). See the [RPC Documentation](https://docs.tendermint.com/master/rpc/)
101-
for more information.
102-
103-
Rate-limiting and authentication are another key aspects to help protect
104-
against DOS attacks. Validators are supposed to use external tools like
105-
[NGINX](https://www.nginx.com/blog/rate-limiting-nginx/) or
106-
[traefik](https://docs.traefik.io/middlewares/ratelimit/)
107-
to achieve the same things.
108-
109-
## Debugging Tendermint
110-
111-
If you ever have to debug Tendermint, the first thing you should probably do is
112-
check out the logs. See [Logging](../nodes/logging.md), where we
113-
explain what certain log statements mean.
114-
115-
If, after skimming through the logs, things are not clear still, the next thing
116-
to try is querying the `/status` RPC endpoint. It provides the necessary info:
117-
whenever the node is syncing or not, what height it is on, etc.
118-
119-
```bash
120-
curl http(s)://{ip}:{rpcPort}/status
121-
```
122-
123-
`/dump_consensus_state` will give you a detailed overview of the consensus
124-
state (proposer, latest validators, peers states). From it, you should be able
125-
to figure out why, for example, the network had halted.
126-
127-
```bash
128-
curl http(s)://{ip}:{rpcPort}/dump_consensus_state
129-
```
130-
131-
There is a reduced version of this endpoint - `/consensus_state`, which returns
132-
just the votes seen at the current height.
133-
134-
If, after consulting with the logs and above endpoints, you still have no idea
135-
what's happening, consider using `tendermint debug kill` sub-command. This
136-
command will scrap all the available info and kill the process. See
137-
[Debugging](../tools/debugging.md) for the exact format.
138-
139-
You can inspect the resulting archive yourself or create an issue on
140-
[Github](https://github.com/tendermint/tendermint). Before opening an issue
141-
however, be sure to check if there's [no existing
142-
issue](https://github.com/tendermint/tendermint/issues) already.
143-
144-
## Monitoring Tendermint
145-
146-
Each Tendermint instance has a standard `/health` RPC endpoint, which responds
147-
with 200 (OK) if everything is fine and 500 (or no response) - if something is
148-
wrong.
149-
150-
Other useful endpoints include mentioned earlier `/status`, `/net_info` and
151-
`/validators`.
152-
153-
Tendermint also can report and serve Prometheus metrics. See
154-
[Metrics](./metrics.md).
155-
156-
`tendermint debug dump` sub-command can be used to periodically dump useful
157-
information into an archive. See [Debugging](../tools/debugging.md) for more
158-
information.
159-
160-
## What happens when my app dies
161-
162-
You are supposed to run Tendermint under a [process
163-
supervisor](https://en.wikipedia.org/wiki/Process_supervision) (like
164-
systemd or runit). It will ensure Tendermint is always running (despite
165-
possible errors).
166-
167-
Getting back to the original question, if your application dies,
168-
Tendermint will panic. After a process supervisor restarts your
169-
application, Tendermint should be able to reconnect successfully. The
170-
order of restart does not matter for it.
171-
172-
## Signal handling
173-
174-
We catch SIGINT and SIGTERM and try to clean up nicely. For other
175-
signals we use the default behavior in Go: [Default behavior of signals
176-
in Go
177-
programs](https://golang.org/pkg/os/signal/#hdr-Default_behavior_of_signals_in_Go_programs).
178-
179-
## Corruption
180-
181-
**NOTE:** Make sure you have a backup of the Tendermint data directory.
182-
183-
### Possible causes
184-
185-
Remember that most corruption is caused by hardware issues:
186-
187-
- RAID controllers with faulty / worn out battery backup, and an unexpected power loss
188-
- Hard disk drives with write-back cache enabled, and an unexpected power loss
189-
- Cheap SSDs with insufficient power-loss protection, and an unexpected power-loss
190-
- Defective RAM
191-
- Defective or overheating CPU(s)
192-
193-
Other causes can be:
194-
195-
- Database systems configured with fsync=off and an OS crash or power loss
196-
- Filesystems configured to use write barriers plus a storage layer that ignores write barriers. LVM is a particular culprit.
197-
- Tendermint bugs
198-
- Operating system bugs
199-
- Admin error (e.g., directly modifying Tendermint data-directory contents)
200-
201-
(Source: <https://wiki.postgresql.org/wiki/Corruption>)
202-
203-
### WAL Corruption
204-
205-
If consensus WAL is corrupted at the latest height and you are trying to start
206-
Tendermint, replay will fail with panic.
207-
208-
Recovering from data corruption can be hard and time-consuming. Here are two approaches you can take:
209-
210-
1. Delete the WAL file and restart Tendermint. It will attempt to sync with other peers.
211-
2. Try to repair the WAL file manually:
212-
213-
1) Create a backup of the corrupted WAL file:
214-
215-
```sh
216-
cp "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal_backup
217-
```
218-
219-
2) Use `./scripts/wal2json` to create a human-readable version:
220-
221-
```sh
222-
./scripts/wal2json/wal2json "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal
223-
```
224-
225-
3) Search for a "CORRUPTED MESSAGE" line.
226-
4) By looking at the previous message and the message after the corrupted one
227-
and looking at the logs, try to rebuild the message. If the consequent
228-
messages are marked as corrupted too (this may happen if length header
229-
got corrupted or some writes did not make it to the WAL ~ truncation),
230-
then remove all the lines starting from the corrupted one and restart
231-
Tendermint.
232-
233-
```sh
234-
$EDITOR /tmp/corrupted_wal
235-
```
236-
237-
5) After editing, convert this file back into binary form by running:
238-
239-
```sh
240-
./scripts/json2wal/json2wal /tmp/corrupted_wal $TMHOME/data/cs.wal/wal
241-
```
242-
243-
## Hardware
244-
245-
### Processor and Memory
246-
247-
While actual specs vary depending on the load and validators count, minimal
248-
requirements are:
249-
250-
- 1GB RAM
251-
- 25GB of disk space
252-
- 1.4 GHz CPU
253-
254-
SSD disks are preferable for applications with high transaction throughput.
255-
256-
Recommended:
257-
258-
- 2GB RAM
259-
- 100GB SSD
260-
- x64 2.0 GHz 2v CPU
261-
262-
While for now, Tendermint stores all the history and it may require significant
263-
disk space over time, we are planning to implement state syncing (See [this
264-
issue](https://github.com/tendermint/tendermint/issues/828)). So, storing all
265-
the past blocks will not be necessary.
266-
267-
### Validator signing on 32 bit architectures (or ARM)
268-
269-
Both our `ed25519` and `secp256k1` implementations require constant time
270-
`uint64` multiplication. Non-constant time crypto can (and has) leaked
271-
private keys on both `ed25519` and `secp256k1`. This doesn't exist in hardware
272-
on 32 bit x86 platforms ([source](https://bearssl.org/ctmul.html)), and it
273-
depends on the compiler to enforce that it is constant time. It's unclear at
274-
this point whenever the Golang compiler does this correctly for all
275-
implementations.
276-
277-
**We do not support nor recommend running a validator on 32 bit architectures OR
278-
the "VIA Nano 2000 Series", and the architectures in the ARM section rated
279-
"S-".**
280-
281-
### Operating Systems
282-
283-
Tendermint can be compiled for a wide range of operating systems thanks to Go
284-
language (the list of \$OS/\$ARCH pairs can be found
285-
[here](https://golang.org/doc/install/source#environment)).
286-
287-
While we do not favor any operation system, more secure and stable Linux server
288-
distributions (like Centos) should be preferred over desktop operation systems
289-
(like Mac OS).
290-
291-
### Miscellaneous
292-
293-
NOTE: if you are going to use Tendermint in a public domain, make sure
294-
you read [hardware recommendations](https://cosmos.network/validators) for a validator in the
295-
Cosmos network.
296-
297-
## Configuration parameters
298-
299-
- `p2p.flush-throttle-timeout`
300-
- `p2p.max-packet-msg-payload-size`
301-
- `p2p.send-rate`
302-
- `p2p.recv-rate`
303-
304-
If you are going to use Tendermint in a private domain and you have a
305-
private high-speed network among your peers, it makes sense to lower
306-
flush throttle timeout and increase other params.
307-
308-
```toml
309-
[p2p]
310-
send-rate=20000000 # 2MB/s
311-
recv-rate=20000000 # 2MB/s
312-
flush-throttle-timeout=10
313-
max-packet-msg-payload-size=10240 # 10KB
314-
```
315-
316-
- `mempool.recheck`
317-
318-
After every block, Tendermint rechecks every transaction left in the
319-
mempool to see if transactions committed in that block affected the
320-
application state, so some of the transactions left may become invalid.
321-
If that does not apply to your application, you can disable it by
322-
setting `mempool.recheck=false`.
323-
324-
- `mempool.broadcast`
325-
326-
Setting this to false will stop the mempool from relaying transactions
327-
to other peers until they are included in a block. It means only the
328-
peer you send the tx to will see it until it is included in a block.
329-
330-
- `consensus.skip-timeout-commit`
331-
332-
We want `skip-timeout-commit=false` when there is economics on the line
333-
because proposers should wait to hear for more votes. But if you don't
334-
care about that and want the fastest consensus, you can skip it. It will
335-
be kept false by default for public deployments (e.g. [Cosmos
336-
Hub](https://cosmos.network/intro/hub)) while for enterprise
337-
applications, setting it to true is not a problem.
338-
339-
- `consensus.peer-gossip-sleep-duration`
340-
341-
You can try to reduce the time your node sleeps before checking if
342-
theres something to send its peers.
343-
344-
- `consensus.timeout-commit`
345-
346-
You can also try lowering `timeout-commit` (time we sleep before
347-
proposing the next block).
348-
349-
- `p2p.addr-book-strict`
350-
351-
By default, Tendermint checks whenever a peer's address is routable before
352-
saving it to the address book. The address is considered as routable if the IP
353-
is [valid and within allowed
354-
ranges](https://github.com/tendermint/tendermint/blob/27bd1deabe4ba6a2d9b463b8f3e3f1e31b993e61/p2p/netaddress.go#L209).
355-
356-
This may not be the case for private or local networks, where your IP range is usually
357-
strictly limited and private. If that case, you need to set `addr-book-strict`
358-
to `false` (turn it off).
359-
360-
- `rpc.max-open-connections`
361-
362-
By default, the number of simultaneous connections is limited because most OS
363-
give you limited number of file descriptors.
364-
365-
If you want to accept greater number of connections, you will need to increase
366-
these limits.
367-
368-
[Sysctls to tune the system to be able to open more connections](https://github.com/satori-com/tcpkali/blob/master/doc/tcpkali.man.md#sysctls-to-tune-the-system-to-be-able-to-open-more-connections)
369-
370-
The process file limits must also be increased, e.g. via `ulimit -n 8192`.
371-
372-
...for N connections, such as 50k:
373-
374-
```md
375-
kern.maxfiles=10000+2*N # BSD
376-
kern.maxfilesperproc=100+2*N # BSD
377-
kern.ipc.maxsockets=10000+2*N # BSD
378-
fs.file-max=10000+2*N # Linux
379-
net.ipv4.tcp_max_orphans=N # Linux
380-
381-
# For load-generating clients.
382-
net.ipv4.ip_local_port_range="10000 65535" # Linux.
383-
net.inet.ip.portrange.first=10000 # BSD/Mac.
384-
net.inet.ip.portrange.last=65535 # (Enough for N < 55535)
385-
net.ipv4.tcp_tw_reuse=1 # Linux
386-
net.inet.tcp.maxtcptw=2*N # BSD
387-
388-
# If using netfilter on Linux:
389-
net.netfilter.nf_conntrack_max=N
390-
echo $((N/8)) > /sys/module/nf_conntrack/parameters/hashsize
391-
```
392-
393-
The similar option exists for limiting the number of gRPC connections -
394-
`rpc.grpc-max-open-connections`.
7+
This file has moved to the [nodes section](../nodes/running-in-production.md).

‎docs/tools/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
order: 1
33
parent:
44
title: Tooling
5-
order: 6
5+
order: 8
66
---
77

88
# Overview

0 commit comments

Comments
 (0)
Please sign in to comment.