Package | Description |
---|---|
org.apache.nutch.parse |
The
Parse interface and related classes. |
org.apache.nutch.parse.html |
An HTML document parsing plugin.
|
org.apache.nutch.parse.tika |
Parse various document formats with help of
Apache Tika.
|
Modifier and Type | Method and Description |
---|---|
Outlink[] |
Parse.getOutlinks() |
static Outlink[] |
OutlinkExtractor.getOutlinks(String plainText,
Configuration conf)
Extracts
Outlink from given plain text. |
static Outlink[] |
OutlinkExtractor.getOutlinks(String plainText,
String anchor,
Configuration conf)
Extracts
Outlink from given plain text and adds anchor to the
extracted Outlink s |
static Outlink |
Outlink.read(DataInput in) |
Modifier and Type | Method and Description |
---|---|
void |
Parse.setOutlinks(Outlink[] outlinks) |
Constructor and Description |
---|
Parse(String text,
String title,
Outlink[] outlinks,
ParseStatus parseStatus) |
Modifier and Type | Method and Description |
---|---|
void |
DOMContentUtils.getOutlinks(URL base,
ArrayList<Outlink> outlinks,
Node node)
|
Modifier and Type | Method and Description |
---|---|
void |
DOMContentUtils.getOutlinks(URL base,
ArrayList<Outlink> outlinks,
Node node)
|
Copyright © 2015 The Apache Software Foundation