public class FetcherJob extends NutchTool implements Tool
Modifier and Type | Class and Description |
---|---|
static class |
FetcherJob.FetcherMapper
Mapper class for Fetcher.
|
Modifier and Type | Field and Description |
---|---|
static org.slf4j.Logger |
LOG |
static String |
PARSE_KEY |
static int |
PERM_REFRESH_TIME |
static String |
PROTOCOL_REDIR |
static org.apache.avro.util.Utf8 |
REDIRECT_DISCOVERED |
static String |
RESUME_KEY |
static String |
THREADS_KEY |
currentJob, currentJobNum, numJobs, results, status
Constructor and Description |
---|
FetcherJob() |
FetcherJob(Configuration conf) |
Modifier and Type | Method and Description |
---|---|
int |
fetch(String batchId,
int threads,
boolean shouldResume,
int numTasks)
Run fetcher.
|
Collection<WebPage.Field> |
getFields(Job job) |
static void |
main(String[] args) |
Map<String,Object> |
run(Map<String,Object> args)
Runs the tool, using a map of arguments.
|
int |
run(String[] args) |
getProgress, getStatus, killJob, stopJob
getConf, setConf
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getConf, setConf
public static final String PROTOCOL_REDIR
public static final int PERM_REFRESH_TIME
public static final org.apache.avro.util.Utf8 REDIRECT_DISCOVERED
public static final String RESUME_KEY
public static final String PARSE_KEY
public static final String THREADS_KEY
public static final org.slf4j.Logger LOG
public FetcherJob()
public FetcherJob(Configuration conf)
public Collection<WebPage.Field> getFields(Job job)
public Map<String,Object> run(Map<String,Object> args) throws Exception
NutchTool
public int fetch(String batchId, int threads, boolean shouldResume, int numTasks) throws Exception
batchId
- batchId (obtained from Generator) or null to fetch all generated
fetchliststhreads
- number of threads per map taskshouldResume
- numTasks
- number of fetching tasks (reducers). If set to < 1 then use the
default, which is mapred.map.tasks.Exception
Copyright © 2015 The Apache Software Foundation