Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize 0.1 #1

Open
wants to merge 44 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
d930672
added snappy to be default compressor
Feb 25, 2024
ad4b05d
improvement based on batch storing
Feb 27, 2024
21f837b
readme
Feb 28, 2024
aa37dbe
todo
Feb 29, 2024
15fc71f
some more load
Mar 2, 2024
d6e3ee7
more load
Mar 4, 2024
457f391
more load
Mar 4, 2024
64f64a5
work on the way
Mar 10, 2024
218be25
more load
Mar 13, 2024
998ec88
more load
Mar 14, 2024
9f9b0ae
achieved 2 seconds read speed
Mar 17, 2024
96b1f82
freezing optimization branch, used binary sparse to optimize
Mar 18, 2024
df43fa9
done
Mar 18, 2024
dcac182
information gathering
Mar 23, 2024
07c6a86
information
Mar 25, 2024
ff92413
information
Mar 25, 2024
1812438
information
Mar 26, 2024
28414d9
information
Mar 27, 2024
6cb524b
more load
Mar 28, 2024
30c9219
more load
Mar 29, 2024
469ed02
more load
Mar 30, 2024
042b574
can write sst now
Mar 30, 2024
556ea91
more load
Mar 30, 2024
5459e1b
sst correctness
Mar 31, 2024
681dfb0
compaction info
Mar 31, 2024
78d6ce1
more load
Apr 2, 2024
0c1594c
more load
Apr 3, 2024
795a8f9
fast kv
Apr 4, 2024
1beafb0
this code generated java bug
Apr 7, 2024
ea21a0b
load
Apr 7, 2024
7efe1ab
load
Apr 7, 2024
4418436
fixed reading speed issue
Jun 16, 2024
6d182f8
achived high performance
Jun 19, 2024
2d3d0c7
minor
Jun 19, 2024
bbb8d4f
more load
Jul 8, 2024
17e3401
idk what i did with git
Sep 23, 2024
7daf5bc
idk what i did with git
Sep 23, 2024
012c266
another load
Nov 9, 2024
5375fb4
moved experiment project and some unwanted files
Nov 24, 2024
7ddf4b7
changed the compactor
Dec 1, 2024
28a29fb
more changes
Dec 27, 2024
fe11d57
improvement in iterators
Jan 5, 2025
327a54a
more
Jan 8, 2025
65b20ff
change in compactor
Jan 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions AtomDB/.idea/compiler.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion AtomDB/.idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Empty file added AtomDB/.mvn/maven.config
Empty file.
Binary file added AtomDB/Docs/.Todo.md.un~
Binary file not shown.
53 changes: 53 additions & 0 deletions AtomDB/Docs/Compaction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
The idea around compaction.

* It about creating overlapping free and compact org.g2n.atomdb.sst.
* Compact SST meaning is that we should have org.g2n.atomdb.sst which holds all the keys which lies in the min and max key.
* By making it compact we gonna reduce the overlap. so if there is a org.g2n.atomdb.sst having keys 10-50 range and has all the elements in it. the for searching we only gonna access this org.g2n.atomdb.sst.
* if there is overlapping then we will have ssts which are like 10-70 and 30-70. here for org.g2n.atomdb.search we will need to access 2 ssts.



Plan:
table, org.g2n.atomdb.search engine, and benchmark.
benchmark should yield super fast results.
then we optimize and clean our existing code.
then we work compaction by planning on smallest how things will move and then implementations.
then optimize and make the code clean.
benchmark and optimize.
write improved unit test, integration test, crash test, performance test under different loads.



LevelDB compaction:
1. 4mb file is level0
2. pick one file from level L and all overlapping files from level L+1.
3. While org.g2n.atomdb.Level 0 -> 1, we take all the overlapping files from 0 and 1 as well. since this is very special level.
4. New create a new org.g2n.atomdb.sst for every 2 mb file.
5. we also switch to a new org.g2n.atomdb.sst when we have grown enough to cover 10 level+2 files. (so that we wont pickup more files from l+2 for next compaction)
6. we remember greatest key of level l so that next time we pick files from greatest key.


TODO:
1. Stable compaction
2. Value updating and delete.
3. Scheduling compaction in background thread.
4. org.g2n.atomdb.Table recreation and manifest file.
5. Search improvement (Cache of blocks)
6. Improve overall code and clean up.
7. Unit test & integration test
8. org.g2n.atomdb.Benchmark
9. Maven deploy
10. Great readme, explaining how to install, use, benchmarks, limitations, ideas, motivation and future work, Pictorial representation of architecture and org.g2n.atomdb.sst.
11. Handling architecture shortcoming, for example the checksum check.

For updation, we can make the iteration in sorted order of latest -> old.
and in sstPersist we can use a set, with this we will only have a unique key and old values will be discarded.

https://github.com/facebook/rocksdb/wiki/Leveled-Compaction
https://blog.senx.io/demystifying-leveldb/
https://stackoverflow.com/questions/61684116/compaction-causes-write-to-hang-until-finished
https://tonyz93.blogspot.com/2016/11/leveldb-source-reading-3-compaction.html
https://www.speedb.io/blog-posts/understanding-leveled-compaction
https://github.com/google/leveldb/blob/main/db/version_set.cc
https://www.google.com/search?q=what+is+a+weak+key+map+guava+java&sca_esv=be2d3384baa617c2&sca_upv=1&rlz=1C1CHBF_enIN1024IN1024&biw=1536&bih=695&sxsrf=ACQVn0_daYOv836fgUD-zntx6kJ9qE1WNg%3A1712689998040&ei=TpMVZtiVAsanseMPvvmVkA8&ved=0ahUKEwjY9MW367WFAxXGU2wGHb58BfIQ4dUDCBA&uact=5&oq=what+is+a+weak+key+map+guava+java&gs_lp=Egxnd3Mtd2l6LXNlcnAaAhgCIiF3aGF0IGlzIGEgd2VhayBrZXkgbWFwIGd1YXZhIGphdmEyBxAhGAoYoAEyBxAhGAoYoAEyBxAhGAoYoAEyBxAhGAoYoAFI9RxQhQRYoRtwBHgBkAEAmAH6AaABjxOqAQUwLjUuN7gBA8gBAPgBAZgCEKACqxPCAgoQABhHGNYEGLADwgIEECMYJ8ICChAhGAoYoAEYiwPCAgQQIRgVmAMAiAYBkAYIkgcFNC41LjegB-cu&sclient=gws-wiz-serp
https://stackoverflow.com/questions/48139062/behaviour-of-caffeine-cache-asmap-views
Loading