net.htmlparser.jericho
public final class Config extends Object
All of the properties in this class are static, affecting all objects and threads. Multiple concurrent configurations are not possible.
Properties that relate to user agent
compatibility issues are stored in instances of the Config.CompatibilityMode
class.
This allows all of the properties in the compatibility mode to be set as a block by setting the static
CurrentCompatibilityMode
property to a different instance.
Config.CompatibilityMode
Modifier and Type | Class and Description |
---|---|
static class |
Config.CompatibilityMode
Represents a set of configuration parameters that relate to
user agent compatibility issues.
|
Modifier and Type | Field and Description |
---|---|
static String |
ColumnMultipleValueSeparator
Determines the string used to separate a single column's multiple values in the output of the
FormFields.getColumnValues(Map) method. |
static String |
ColumnValueFalse
Determines the string that represents the value
false in the output of the FormFields.getColumnValues(Map) method. |
static String |
ColumnValueTrue
Determines the string that represents the value
true in the output of the FormFields.getColumnValues(Map) method. |
static boolean |
ConvertNonBreakingSpaces
Determines whether the
CharacterReference.decode(CharSequence) and similar methods convert non-breaking space ( ) character references to normal spaces. |
static Config.CompatibilityMode |
CurrentCompatibilityMode
Determines the currently active compatibility mode.
|
static boolean |
IsApostropheEncoded
Determines whether apostrophes are encoded when calling the
CharacterReference.encode(CharSequence) method. |
static LoggerProvider |
LoggerProvider
Determines the
LoggerProvider that is used to create the default Logger object for each new Source object. |
static String |
NewLine
Determines the string used to represent a newline in text output throughout the library.
|
public static String ColumnMultipleValueSeparator
FormFields.getColumnValues(Map)
method.
The situation where a single column has multiple values only arises if FormField.getUserValueCount()
>1
on the relevant form field, which usually indicates a poorly designed form.
The default value is ",
" (a comma, not including the quotes).
Must not be null
.
public static String ColumnValueTrue
true
in the output of the FormFields.getColumnValues(Map)
method.
The default value is "true
" (without the quotes).
Must not be null
.
public static String ColumnValueFalse
false
in the output of the FormFields.getColumnValues(Map)
method.
The default value is null
, which represents no output at all.
public static boolean ConvertNonBreakingSpaces
CharacterReference.decode(CharSequence)
and similar methods convert non-breaking space (
) character references to normal spaces.
The default value is true
.
When this property is set to false
, non-breaking space (
)
character references are decoded as non-breaking space characters (U+00A0) instead of being converted to normal spaces (U+0020).
The default behaviour of the library reflects the fact that non-breaking space character references are almost always used in HTML documents as a non-collapsing white space character. Converting them to the correct character code U+00A0, which is represented by a visible character in many older character sets, was confusing to most users who expected to see only normal spaces. The most common example of this is its visualisation as the character á in the MS-DOS CP437 character set.
The functionality of the following methods is affected:
CharacterReference.appendCharTo(Appendable)
CharacterReference.decode(CharSequence)
CharacterReference.decode(CharSequence, boolean insideAttributeValue)
CharacterReference.decodeCollapseWhiteSpace(CharSequence)
CharacterReference.reencode(CharSequence)
Attribute.getValue()
Attributes.getValue(String name)
Attributes.populateMap(Map, boolean convertNamesToLowerCase)
StartTag.getAttributeValue(String attributeName)
Element.getAttributeValue(String attributeName)
FormControl.getPredefinedValues()
OutputDocument.replace(Attributes, boolean convertNamesToLowerCase)
Renderer.getConvertNonBreakingSpaces()
TextExtractor.getConvertNonBreakingSpaces()
public static Config.CompatibilityMode CurrentCompatibilityMode
The default setting is Config.CompatibilityMode.IE
(MS Internet Explorer 6.0).
Must not be null
.
public static boolean IsApostropheEncoded
CharacterReference.encode(CharSequence)
method.
A value of false
means apostrophe
(U+0027) characters are not encoded.
The only time apostrophes need to be encoded is within an attribute value delimited by
single quotes (apostrophes), so in most cases ignoring apostrophes is perfectly safe and
enhances the readability of the source document.
Note that apostrophes are always encoded as a numeric character reference, never as the
character entity reference '
.
The default value is false
.
public static LoggerProvider LoggerProvider
LoggerProvider
that is used to create the default Logger
object for each new Source
object.
The LoggerProvider
interface contains several predefined LoggerProvider
instances which this property can be set to,
mostly representing wrappers to common logging frameworks.
The default value is null
, which results in the auto-detection of the most appropriate logging mechanism according to the following algorithm:
org.slf4j.impl.StaticLoggerBinder
is detected:
org.slf4j.impl.JDK14LoggerFactory
is detected, use LoggerProvider.JAVA
.
org.slf4j.impl.Log4jLoggerFactory
is detected, use LoggerProvider.LOG4J
.
org.slf4j.impl.JCLLoggerFactory
is NOT detected, use LoggerProvider.SLF4J
.
org.apache.commons.logging.Log
is detected:
Create an instance of it using the commons-loggingLogFactory
class.
- If the created
Log
is of typeorg.apache.commons.logging.impl.Jdk14Logger
, useLoggerProvider.JAVA
.- If the created
Log
is of typeorg.apache.commons.logging.impl.Log4JLogger
, useLoggerProvider.LOG4J
.- otherwise, use
LoggerProvider.JCL
.
org.apache.log4j.Logger
is detected, use LoggerProvider.LOG4J
.
LoggerProvider.JAVA
.
Source.setLogger(Logger)