Skip to content
This repository was archived by the owner on Aug 23, 2020. It is now read-only.

Node failing to solidify milestones correctly #1655

Open
DyrellC opened this issue Nov 8, 2019 · 4 comments
Open

Node failing to solidify milestones correctly #1655

DyrellC opened this issue Nov 8, 2019 · 4 comments

Comments

@DyrellC
Copy link
Contributor

DyrellC commented Nov 8, 2019

Bug description

When trying to solidify over wide gaps, there is a possibility that the node will not properly solidify. This can be identified in two ways. Either the node will hang on the same milestone, producing an output as follows:

Solidifying milestone #100 [20 / 397]
Solidifying milestone #100 [20 / 398]
Solidifying milestone #100 [20 / 399]
Solidifying milestone #100 [20 / 399]

or the solidifer fails to print a message, at all and the latestSolidMilestone remains the same while the latestMilestone continues to grow. In the latter instance, it can sometimes mean that there is one milestone in the unsolidMilestonesPool so no output is printed. In this case it is the same as the former scenario where the same milestone is being requested, and no further milestones are being added to the unsolidMilestonesPool.

When investigating the milestones that were failing to solidify it appeared that some of the transactions were present in the db, but were not marked as milestones due to the other milestone in the bundle not being solid. In other instances the milestone was never found through transactionValidator.checkSolidity call. This may be caused by a milestone being left behind/effectively orphaned during large spam events or splitting. In these cases the milestone would not be found when solidifying backwards through the tangle.

IRI version

v1.8.2

Hardware Spec

Linux Mint, 8GB 4 cpu 160GB SSD x 2

Steps To Reproduce

Testnet

  1. Start one node node with https://s3.eu-central-1.amazonaws.com/iotaledger-dbfiles/dev/SyncTestDB.tar and another with https://s3.eu-central-1.amazonaws.com/iotaledger-dbfiles/dev/EmptyDB.tar. Make sure the nodes are already neighbored. For faster syncing add extra solid nodes to the mix.
  2. Start nodes with the following configuration
    java -jar iri-1.* -p 14265 -t 15600 --zmq-enabled true --zmq-port 5556 --testnet true --testnet-coordinator EFPNKGPCBXXXLIBYFGIGYBYTFFPIOQVNNVVWTTIYZO9NFREQGVGDQQHUUQ9CLWAEMXVDFSSMOTGAHVIBH --testnet-no-coo-validation true --milestone-start 0 --mwm 1 --remote true --remote-limit-api "" --snapshot ./snapshot.txt -n 'your.neighbours.here'
  3. Issue a milestone using python milestone.py -i 1001 from https://github.com/DyrellC/iri-regression-tests/tree/add-sync-tests/Nightly-Tests/Sync-Tests to kick-start the solidification
  4. Wait for the node to finish "syncing"

Mainnet (Doesn't always happen)

  1. Start up a node from a couple hundred milestones behind
  2. Let node try to sync
  3. Watch it spin out (Sometimes)

Expected behaviour

Nodes should synchronise properly.

Actual behaviour

Nodes hang on solidifying specific milestones.

@DyrellC
Copy link
Contributor Author

DyrellC commented Nov 8, 2019

@karimodm and I discussed another issue with solidification within the LatestMilestoneTrackerImpl which could be the culprit for failed milestone issuance from the coordinator in devnet. As is, the collectMilestoneCandidates call will pull all transactions with the coordinator address and scan through them. However when it does this, the ordering of transactions in the set pulled is randomised, and the maximum amount of transactions to analyse before stopping the scan is set to 5000 currently. With ~1.3 million milestones, and the randomised nature of pulling the candidates means that each time the collectMilestoneCandidates call is made, there's a 0.38% chance that the newest added milestone is present in the first 5000 analysed transactions. You could improve the probability of the new milestone being found by increasing the maximum analysed transactions, but this isn't a solid solution. I proposed to @achabill that a possible solution would be to order the set of transactions by the attachment timestamp associated with the hash so that within the first X transactions analysed there is a much higher probability of the milestone being seen faster. This would however increase the processing requirements for the scan because the list of hashes will need to pull each transaction from the db to filter by timestamp. The increase in processing per scan should be offset by the reduced time needed to find new milestones.

@achabill
Copy link
Contributor

achabill commented Nov 11, 2019

Considering the sorting proposal, we could sort the hashes in AddressViewModel and use the list of TransactionViewModel in collectNewMilestone candidates.

AdressViewModel

List<TransactionViewModel> loadSorted(tangle,hash){
    hashes = load(Address.class, hash)
    return hashes.map(fromHash(tangle,item))
                         .sorted(attachmentTimeStamp)
                         .toList()

We would also have to change milestoneCanditatesToAnalyze and related variables to use TransactionViewModel instead of Hash and then call processMilestoneCandidates(TransactionViewModel tvm) directly.

@achabill achabill self-assigned this Nov 11, 2019
@GalRogozinski
Copy link
Contributor

If the above solution is quick and it works, we can do it.
Else, we can consider the following:
#1447 (comment)

@GalRogozinski
Copy link
Contributor

Even though we merged a solution, I will be closing this once #1674 is done.
We may erase the changes of #1660 as a result

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants