public class Ftp extends Object implements Protocol
FtpResponse
object and gets the content of the url from it.
Configurable parameters are ftp.username
, ftp.password
,
ftp.content.limit
, ftp.timeout
, ftp.server.timeout
,
ftp.password
, ftp.keep.connection
and ftp.follow.talk
. For details see "FTP properties" section in nutch-default.xml
.Modifier and Type | Field and Description |
---|---|
static org.slf4j.Logger |
LOG |
CHECK_BLOCKING, CHECK_ROBOTS, X_POINT_ID
Constructor and Description |
---|
Ftp() |
Modifier and Type | Method and Description |
---|---|
protected void |
finalize() |
Configuration |
getConf()
Get the
Configuration object |
Collection<WebPage.Field> |
getFields() |
ProtocolOutput |
getProtocolOutput(String url,
WebPage page)
Creates a
FtpResponse object corresponding to the url and returns a
ProtocolOutput object as per the content received |
crawlercommons.robots.BaseRobotRules |
getRobotRules(String url,
WebPage page)
Get the robots rules for a given url
|
static void |
main(String[] args)
For debugging.
|
void |
setConf(Configuration conf)
Set the
Configuration object |
void |
setFollowTalk(boolean followTalk)
Set followTalk
|
void |
setKeepConnection(boolean keepConnection)
Set keepConnection
|
void |
setMaxContentLength(int length)
Set the point at which content is truncated.
|
void |
setTimeout(int to)
Set the timeout.
|
public void setTimeout(int to)
public void setMaxContentLength(int length)
public void setFollowTalk(boolean followTalk)
public void setKeepConnection(boolean keepConnection)
public ProtocolOutput getProtocolOutput(String url, WebPage page)
FtpResponse
object corresponding to the url and returns a
ProtocolOutput
object as per the content receivedgetProtocolOutput
in interface Protocol
url
- Text containing the ftp urldatum
- The CrawlDatum object corresponding to the urlProtocolOutput
object for the urlpublic void setConf(Configuration conf)
Configuration
objectsetConf
in interface Configurable
public Configuration getConf()
Configuration
objectgetConf
in interface Configurable
public Collection<WebPage.Field> getFields()
getFields
in interface FieldPluggable
public crawlercommons.robots.BaseRobotRules getRobotRules(String url, WebPage page)
getRobotRules
in interface Protocol
url
- url to checkCopyright © 2015 The Apache Software Foundation