A trie is a tree-like data structure in which every node stores a character. A trie can store one or multiple words. If a trie stores all suffixes of a word, it is called a suffix trie. After you build a suffix trie, all substrings of a word can be retrieved easily. This post is about using suffix trie to store and retrieve all substrings (including all suffixes) in a word.

suffix trie diagram

All substrings in a word are contiguous sequences of characters within a string with all possible lengths. For example, a word “ana”, its substrings include “a”, “an”, “ana”, “n”, “na”, total 5 of them. Note, that all substrings are different from all suffixes of a word. In the example of “ana”, its suffixes are “ana”, “na”, “a”, total of 3 of them. A suffix trie holds all substrings, including suffixes, of a word.

The implementation in this post uses an array data structure to store children, which is used to find the next character in the branch. The alternative solution uses a hashmap to store children. The implementation using an array is more intuitive. But each node allocates a size of 26 for the array. Most spaces are empty, which is not efficient.

Table of Content


Map of suffix trie implementations

you are here

Part 1 – Suffix trie implementation using array.
Part 2 – Suffix trie implementation using hash map


Define a SuffixTrieNode class

A SuffixTrieNode is a class to create nodes in a suffix trie. It has 2 variables.  children is a data structure that points to the node that stores the next character in one suffix (branch). The data structure is an array. Since we use only the alphabet a-z, the length of the array is 26. indices is a list data structure that stores the start position of a substring in a word.

Java

Javascript

Python


Build a suffix trie

A SuffixTrie class has one variable root. All operations, such as insertion, search, or traversal, start from the root. In a suffix trie, only the root has multiple children. Other nodes except leaves have one child, which is the node of the next character in the suffix. You can initialize the root in the constructor.

To build a suffix trie from a word, you use the word as the input of the constructor. You subtract suffixes from the word and call the insertSuffix method with the root, a suffix, and the index of the starting position of the suffix in the word as input.

insertSuffix method is a recursive function. The first thing is to save the index in the input node’s indices. It records the position of the substring inside the input node. It will be used in search. If the input string is null or its length is 0, it is a terminal condition. All recursive calls return to calling functions.

Otherwise, you retrieve the first character of the input string. Use the character as the index to check the input node’s children. If there is no value in that cell, create a suffixTrieNode as a child node and save it in that cell. Then the method calls itself with the new node, the substring after the first character, and the index.

Java

Javascript

Python

Search is to find a substring in the word. The search method is a wrapper method of searchHelper. The start node is the root. searchHelper is a recursive method. When the input string is null or the length is 0, it is a terminal condition, you return from the recursive calls. Otherwise, you take the first character of the input string. Use the character as the index to check whether there is a value in the cell of the input node’s children. If there is a value in that cell, the value is the node with the next character. You get the node from the cell so that you can keep searching for the following characters. The recursive method calls itself with the node and the substring after the first character. If the cell doesn’t have a value, return null. 

Java

Javascript

Python


Print all nodes in trie

To print all nodes in a trie, you use recursion again. When the input node is null, it is a terminal condition, you can return. Otherwise, you print the input string, which is the substring from the root to the current node. Then you check all cells in the node’s children with the indices letter a-z. If there is a value at one cell, the value is the next node in this branch. The method calls itself using this node and the input string appended with the letter.

Java

Javascript

Python


Free download

Download suffix trie implementation using Array in Java, JavaScript and Python
Trie implmentation using array
Data structures introduction PDF