org.apache.commons.codec.language.bm
public class PhoneticEngine extends Object
Converts words into potential phonetic representations.
This is a two-stage process. Firstly, the word is converted into a phonetic representation that takes into account the likely source language. Next, this phonetic representation is converted into a pan-european 'average' representation, allowing comparison between different versions of essentially the same word from different languages.
This class is intentionally immutable. If you wish to alter the settings for a PhoneticEngine, you must make a new one with the updated settings. This makes the class thread-safe.
Ported from phoneticengine.php
Modifier and Type | Class and Description |
---|---|
(package private) static class |
PhoneticEngine.PhonemeBuilder
Utility for manipulating a set of phonemes as they are being built up.
|
private static class |
PhoneticEngine.RulesApplication
A function closure capturing the application of a list of rules to an input sequence at a particular offset.
|
Modifier and Type | Field and Description |
---|---|
private boolean |
concat |
private Lang |
lang |
private static Map<NameType,Set<String>> |
NAME_PREFIXES |
private NameType |
nameType |
private RuleType |
ruleType |
Constructor and Description |
---|
PhoneticEngine(NameType nameType,
RuleType ruleType,
boolean concat)
Generates a new, fully-configured phonetic engine.
|
Modifier and Type | Method and Description |
---|---|
private PhoneticEngine.PhonemeBuilder |
applyFinalRules(PhoneticEngine.PhonemeBuilder phonemeBuilder,
List<Rule> finalRules)
Applies the final rules to convert from a language-specific phonetic representation to a language-independent
representation.
|
private static CharSequence |
cacheSubSequence(CharSequence cached)
This is a performance hack to avoid overhead associated with very frequent CharSequence.subSequence calls.
|
String |
encode(String input)
Encodes a string to its phonetic representation.
|
String |
encode(String input,
Languages.LanguageSet languageSet)
Encodes an input string into an output phonetic representation, given a set of possible origin languages.
|
Lang |
getLang()
Gets the Lang language guessing rules being used.
|
NameType |
getNameType()
Gets the NameType being used.
|
RuleType |
getRuleType()
Gets the RuleType being used.
|
boolean |
isConcat()
Gets if multiple phonetic encodings are concatenated or if just the first one is kept.
|
private static String |
join(Iterable<String> strings,
String sep)
Joins some strings with an internal separator.
|
private final Lang lang
private final NameType nameType
private final RuleType ruleType
private final boolean concat
private static CharSequence cacheSubSequence(CharSequence cached)
cached
- the character sequence to cacheCharSequence
that internally memoises subSequence valuesprivate static String join(Iterable<String> strings, String sep)
strings
- Strings to joinsep
- String to separate them withstrings
interlieved by sep
private PhoneticEngine.PhonemeBuilder applyFinalRules(PhoneticEngine.PhonemeBuilder phonemeBuilder, List<Rule> finalRules)
phonemeBuilder
- finalRules
- public String encode(String input)
input
- the String to encodepublic String encode(String input, Languages.LanguageSet languageSet)
input
- String to phoneticise; a String with dashes or spaces separating each wordlanguageSet
- public Lang getLang()
public NameType getNameType()
public RuleType getRuleType()
public boolean isConcat()
commons-codec version 1.6-SNAPSHOT - Copyright © 2002-2014 - Apache Software Foundation