Package | Description |
---|---|
org.apache.nutch.net.urlnormalizer.basic |
URL normalizer performing basic normalizations: remove default ports
and dot segments in path.
|
org.apache.nutch.net.urlnormalizer.pass |
URL normalizer dummy which does not change URLs.
|
org.apache.nutch.net.urlnormalizer.regex |
URL normalizer with configurable rules based on regular expressions
(
Pattern ). |
Modifier and Type | Class and Description |
---|---|
class |
BasicURLNormalizer
Converts URLs to a normal form:
remove dot segments in path:
/./ or /../
remove default ports, e.g. |
Modifier and Type | Class and Description |
---|---|
class |
PassURLNormalizer
This URLNormalizer doesn't change urls.
|
Modifier and Type | Class and Description |
---|---|
class |
RegexURLNormalizer
Allows users to do regex substitutions on all/any URLs that are encountered,
which is useful for stripping session IDs from URLs.
|
Copyright © 2015 The Apache Software Foundation