|
1 |
| -/* This code is almost entirely based on suffix from BurntSushi. The original |
2 |
| -* program was licensed under the MIT license. We have modified it for |
3 |
| -* for two reasons: |
4 |
| -* |
5 |
| -* 1. The original implementation used u32 indices to point into the |
6 |
| -* suffix array. This is smaller and fairly cache efficient, but here |
7 |
| -* in the Real World we have to work with Big Data and our datasets |
8 |
| -* are bigger than 2^32 bytes. So we have to work with u64 instead. |
9 |
| -* |
10 |
| -* 2. The original implementation had a utf8 interface. This is very |
11 |
| -* convenient if you're working with strings, but we are working with |
12 |
| -* byte arrays almost exclusively, and so just cut out the strings. |
13 |
| -* |
14 |
| -* When the comments below contradict these two statements, that's why. |
15 |
| -*/ |
16 | 1 | extern crate utf16_literal;
|
17 | 2 |
|
18 | 3 | use rayon::prelude::*;
|
19 | 4 | use serde::{Deserialize, Serialize};
|
20 | 5 | use std::{fmt, ops::Deref, u64};
|
21 | 6 |
|
22 | 7 | /// A suffix table is a sequence of lexicographically sorted suffixes.
|
23 |
| -/// |
24 |
| -/// This is distinct from a suffix array in that it *only* contains |
25 |
| -/// suffix indices. It has no "enhanced" information like the inverse suffix |
26 |
| -/// table or least-common-prefix lengths (LCP array). This representation |
27 |
| -/// limits what you can do (and how fast), but it uses very little memory |
28 |
| -/// (4 bytes per character in the text). |
29 | 8 | #[derive(Clone, Deserialize, Eq, PartialEq, Serialize)]
|
30 | 9 | pub struct SuffixTable<T = Box<[u16]>, U = Box<[u64]>> {
|
31 | 10 | text: T,
|
|
0 commit comments