Module org.apache.arrow.algorithm
Class SearchDictionaryEncoder<E extends BaseIntVector,D extends ValueVector>
java.lang.Object
org.apache.arrow.algorithm.dictionary.SearchDictionaryEncoder<E,D>
- Type Parameters:
E
- encoded vector type.D
- decoded vector type, which is also the dictionary type.
- All Implemented Interfaces:
DictionaryEncoder<E,
D>
public class SearchDictionaryEncoder<E extends BaseIntVector,D extends ValueVector>
extends Object
implements DictionaryEncoder<E,D>
Dictionary encoder based on searching.
-
Constructor Summary
ConstructorDescriptionSearchDictionaryEncoder
(D dictionary, VectorValueComparator<D> comparator) Constructs a dictionary encoder.SearchDictionaryEncoder
(D dictionary, VectorValueComparator<D> comparator, boolean encodeNull) Constructs a dictionary encoder. -
Method Summary
-
Constructor Details
-
SearchDictionaryEncoder
Constructs a dictionary encoder.- Parameters:
dictionary
- the dictionary. It must be in sorted order.comparator
- the criteria for sorting.
-
SearchDictionaryEncoder
public SearchDictionaryEncoder(D dictionary, VectorValueComparator<D> comparator, boolean encodeNull) Constructs a dictionary encoder.- Parameters:
dictionary
- the dictionary. It must be in sorted order.comparator
- the criteria for sorting.encodeNull
- a flag indicating if null should be encoded. It determines the behaviors for processing null values in the input during encoding. When a null is encountered in the input, 1) If the flag is set to true, the encoder searches for the value in the dictionary, and outputs the index in the dictionary. 2) If the flag is set to false, the encoder simply produces a null in the output.
-
-
Method Details
-
encode
Encodes an input vector by binary search. So the algorithm takes O(n * log(m)) time, where n is the length of the input vector, and m is the length of the dictionary.- Specified by:
encode
in interfaceDictionaryEncoder<E extends BaseIntVector,
D extends ValueVector> - Parameters:
input
- the input vector.output
- the output vector. Note that it must be in a fresh state. At least, all its validity bits should be clear.
-