public class Utils
extends Object
Constructor and Description |
---|
Utils() |
Modifier and Type | Method and Description |
---|---|
static String |
BACKUP_STANDALONE_MASTER_PREFIX()
An identifier that backup masters use in their responses.
|
static String |
buildLocationMetadata(scala.collection.Seq<org.apache.hadoop.fs.Path> paths,
int stopAppendingThreshold)
Convert a sequence of
Path s to a metadata string. |
static String |
bytesToString(scala.math.BigInt size) |
static String |
bytesToString(long size)
Convert a quantity in bytes to a human-readable string such as "4.0 MiB".
|
static long |
byteStringAsBytes(String str)
Convert a passed byte string (e.g.
|
static long |
byteStringAsGb(String str)
Convert a passed byte string (e.g.
|
static long |
byteStringAsKb(String str)
Convert a passed byte string (e.g.
|
static long |
byteStringAsMb(String str)
Convert a passed byte string (e.g.
|
static String |
checkAndGetK8sMasterUrl(String rawMasterURL)
Check the validity of the given Kubernetes master URL and return the resolved URL.
|
static void |
checkHost(String host)
Checks if the host contains only valid hostname/ip without port
NOTE: Incase of IPV6 ip it should be enclosed inside []
|
static void |
checkHostPort(String hostPort) |
static long |
checkOffHeapEnabled(SparkConf sparkConf,
long offHeapSize)
return 0 if MEMORY_OFFHEAP_ENABLED is false.
|
static boolean |
chmod700(java.io.File file)
JDK equivalent of
chmod 700 file . |
static <C> Class<C> |
classForName(String className,
boolean initialize,
boolean noSparkClassLoader)
Preferred alternative to Class.forName(className), as well as
Class.forName(className, initialize, loader) with current thread's ContextClassLoader.
|
static boolean |
classIsLoadable(String clazz)
Determines whether the provided class is loadable in the current thread.
|
static <T> T |
clone(T value,
SerializerInstance serializer,
scala.reflect.ClassTag<T> evidence$2)
Clone an object using a Spark serializer.
|
static java.util.Properties |
cloneProperties(java.util.Properties props)
Create a new properties object with the same values as `props`
|
static void |
copyFileStreamNIO(java.nio.channels.FileChannel input,
java.nio.channels.WritableByteChannel output,
long startPosition,
long bytesToCopy) |
static long |
copyStream(java.io.InputStream in,
java.io.OutputStream out,
boolean closeStreams,
boolean transferToEnabled)
Copy all data from an InputStream to an OutputStream.
|
static java.io.InputStream |
copyStreamUpTo(java.io.InputStream in,
long maxSize)
Copy the first
maxSize bytes of data from the InputStream to an in-memory
buffer, primarily to check for corruption. |
static boolean |
createDirectory(java.io.File dir)
Create a directory given the abstract pathname
|
static java.io.File |
createDirectory(String root,
String namePrefix)
Create a directory inside the given parent directory.
|
static String |
createFailedToGetTokenMessage(String serviceName,
Throwable e)
Returns a string message about delegation token generation failure
|
static String |
createSecret(SparkConf conf) |
static java.io.File |
createTempDir(String root,
String namePrefix)
Create a temporary directory inside the given parent directory.
|
static String |
decodeFileNameInURI(java.net.URI uri)
Get the file name from uri's raw path and decode it.
|
static int |
DEFAULT_DRIVER_MEM_MB()
Define a default value for driver memory here since this value is referenced across the code
base and nearly all files already use Utils.scala
|
static void |
deleteRecursively(java.io.File file)
Delete a file or directory and its contents recursively.
|
static <T> T |
deserialize(byte[] bytes)
Deserialize an object using Java serialization
|
static <T> T |
deserialize(byte[] bytes,
ClassLoader loader)
Deserialize an object using Java serialization and the given ClassLoader
|
static long |
deserializeLongValue(byte[] bytes)
Deserialize a Long value (used for
org.apache.spark.api.python.PythonPartitioner ) |
static void |
deserializeViaNestedStream(java.io.InputStream is,
SerializerInstance ser,
scala.Function1<DeserializationStream,scala.runtime.BoxedUnit> f)
Deserialize via nested stream using specific serializer
|
static boolean |
doesDirectoryContainAnyNewFiles(java.io.File dir,
long cutoff)
Determines if a directory contains any files newer than cutoff seconds.
|
static java.io.File |
doFetchFile(String url,
java.io.File targetDir,
String filename,
SparkConf conf,
org.apache.hadoop.conf.Configuration hadoopConf)
Download a file or directory to target directory.
|
static scala.collection.immutable.Set<String> |
EMPTY_USER_GROUPS() |
static String |
encodeFileNameToURIRawPath(String fileName)
A file name may contain some invalid URI characters, such as " ".
|
static String |
exceptionString(Throwable e)
Return a nice string representation of the exception.
|
static String |
executeAndGetOutput(scala.collection.Seq<String> command,
java.io.File workingDir,
scala.collection.Map<String,String> extraEnvironment,
boolean redirectStderr)
Execute a command and get its output, throwing an exception if it yields a code other than 0.
|
static Process |
executeCommand(scala.collection.Seq<String> command,
java.io.File workingDir,
scala.collection.Map<String,String> extraEnvironment,
boolean redirectStderr)
Execute a command and return the process running the command.
|
static int |
executorOffHeapMemorySizeAsMb(SparkConf sparkConf)
Convert MEMORY_OFFHEAP_SIZE to MB Unit, return 0 if MEMORY_OFFHEAP_ENABLED is false.
|
static scala.Tuple2<String,Object> |
extractHostPortFromSparkUrl(String sparkUrl)
Return a pair of host and port extracted from the
sparkUrl . |
static java.io.File |
fetchFile(String url,
java.io.File targetDir,
SparkConf conf,
org.apache.hadoop.conf.Configuration hadoopConf,
long timestamp,
boolean useCache,
boolean shouldUntar)
Download a file or directory to target directory.
|
static org.apache.spark.util.CallSite |
getCallSite(scala.Function1<String,Object> skipClass)
When called inside a class in the spark package, returns the name of the user code class
(outside the spark package) that called into Spark, as well as which Spark method they called.
|
static String[] |
getConfiguredLocalDirs(SparkConf conf)
Return the configured local directories where Spark can write files.
|
static ClassLoader |
getContextOrSparkClassLoader()
Get the Context ClassLoader on this thread or, if not present, the ClassLoader that
loaded Spark.
|
static scala.collection.immutable.Set<String> |
getCurrentUserGroups(SparkConf sparkConf,
String username) |
static String |
getCurrentUserName()
Returns the current user name.
|
static String |
getDefaultPropertiesFile(scala.collection.Map<String,String> env)
Return the path of the default Spark properties file.
|
static int |
getDynamicAllocationInitialExecutors(SparkConf conf)
Return the initial number of executors for dynamic allocation.
|
static long |
getFileLength(java.io.File file,
SparkConf workConf)
Return the file length, if the file is compressed it returns the uncompressed file length.
|
static String |
getFormattedClassName(Object obj)
Return the class name of the given object, removing all dollar signs
|
static org.apache.hadoop.fs.FileSystem |
getHadoopFileSystem(String path,
org.apache.hadoop.conf.Configuration conf)
Return a Hadoop FileSystem with the scheme encoded in the given path.
|
static org.apache.hadoop.fs.FileSystem |
getHadoopFileSystem(java.net.URI path,
org.apache.hadoop.conf.Configuration conf)
Return a Hadoop FileSystem with the scheme encoded in the given path.
|
static long |
getIteratorSize(scala.collection.Iterator<?> iterator)
Counts the number of elements of an iterator using a while loop rather than calling
TraversableOnce.size() because it uses a for loop, which is slightly slower
in the current version of Scala. |
static <T> scala.collection.Iterator<scala.Tuple2<T,Object>> |
getIteratorZipWithIndex(scala.collection.Iterator<T> iter,
long startIndex)
Generate a zipWithIndex iterator, avoid index value overflowing problem
in scala's zipWithIndex
|
static String |
getLocalDir(SparkConf conf)
Get the path of a temporary directory.
|
static scala.collection.Seq<String> |
getLocalUserJarsForShell(SparkConf conf)
Return the local jar files which will be added to REPL's classpath.
|
static String |
getProcessName()
Returns the name of this JVM process.
|
static scala.collection.Map<String,String> |
getPropertiesFromFile(String filename)
Load properties present in the given file.
|
static String |
getSimpleName(Class<?> cls)
Safer than Class obj's getSimpleName which may throw Malformed class name error in scala.
|
static ClassLoader |
getSparkClassLoader()
Get the ClassLoader which loaded Spark.
|
static String |
getSparkOrYarnConfig(SparkConf conf,
String key,
String default_)
Return the value of a config either through the SparkConf or the Hadoop configuration.
|
static scala.Option<String> |
getStderr(Process process,
long timeoutMs)
Return the stderr of a process after waiting for the process to terminate.
|
static scala.collection.Map<String,String> |
getSystemProperties()
Returns the system properties map that is thread-safe to iterator over.
|
static ThreadStackTrace[] |
getThreadDump()
Return a thread dump of all threads' stacktraces.
|
static scala.Option<ThreadStackTrace> |
getThreadDumpForThread(long threadId) |
static String |
getUsedTimeNs(long startTimeNs)
Return the string to tell how long has passed in milliseconds.
|
static scala.collection.Seq<String> |
getUserJars(SparkConf conf)
Return the jar files pointed by the "spark.jars" property.
|
static void |
initDaemon(org.slf4j.Logger log)
Utility function that should be called early in
main() for daemons to set up some common
diagnostic state. |
static <T> T |
instantiateSerializerFromConf(org.apache.spark.internal.config.ConfigEntry<String> propertyName,
SparkConf conf,
boolean isDriver) |
static <T> T |
instantiateSerializerOrShuffleManager(String className,
SparkConf conf,
boolean isDriver) |
static boolean |
isAbsoluteURI(String path)
Check whether a path is an absolute URI.
|
static boolean |
isBindCollision(Throwable exception)
Return whether the exception is caused by an address-port collision when binding.
|
static boolean |
isClientMode(SparkConf conf) |
static boolean |
isDynamicAllocationEnabled(SparkConf conf)
Return whether dynamic allocation is enabled in the given conf.
|
static boolean |
isFatalError(Throwable e)
Returns true if the given exception was fatal.
|
static boolean |
isFileSplittable(org.apache.hadoop.fs.Path path,
org.apache.hadoop.io.compress.CompressionCodecFactory codecFactory)
Check whether the file of the path is splittable.
|
static boolean |
isInDirectory(java.io.File parent,
java.io.File child)
Return whether the specified file is a parent directory of the child file.
|
static boolean |
isInRunningSparkTask()
Returns if the current codes are running in a Spark task, e.g., in executors.
|
static boolean |
isLocalMaster(SparkConf conf) |
static boolean |
isLocalUri(String uri)
Returns whether the URI is a "local:" URI.
|
static boolean |
isMac()
Whether the underlying operating system is Mac OS X.
|
static boolean |
isMacOnAppleSilicon()
Whether the underlying operating system is Mac OS X and processor is Apple Silicon.
|
static boolean |
isMemberClass(Class<?> cls)
Returns true if and only if the underlying class is a member class.
|
static boolean |
isPushBasedShuffleEnabled(SparkConf conf,
boolean isDriver,
boolean checkSerializer)
Push based shuffle can only be enabled when below conditions are met:
- the application is submitted to run in YARN mode
- external shuffle service enabled
- IO encryption disabled
- serializer(such as KryoSerializer) supports relocation of serialized objects
|
static boolean |
isStreamingDynamicAllocationEnabled(SparkConf conf) |
static boolean |
isTesting()
Indicates whether Spark is currently running unit tests.
|
static boolean |
isWindows()
Whether the underlying operating system is Windows.
|
static String |
libraryPathEnvName()
Return the current system LD_LIBRARY_PATH name
|
static String |
libraryPathEnvPrefix(scala.collection.Seq<String> libraryPaths)
Return the prefix of a command that appends the given library paths to the
system-specific library path environment variable.
|
static String |
loadDefaultSparkProperties(SparkConf conf,
String filePath)
Load default Spark properties from the given file.
|
static <T> scala.collection.Seq<T> |
loadExtensions(Class<T> extClass,
scala.collection.Seq<String> classes,
SparkConf conf)
Create instances of extension classes.
|
static String |
LOCAL_SCHEME()
Scheme used for files that are locally available on worker nodes in the cluster.
|
static String |
localCanonicalHostName()
Get the local machine's FQDN.
|
static String |
localHostName()
Get the local machine's hostname.
|
static String |
localHostNameForURI()
Get the local machine's URI.
|
static <T> T |
logUncaughtExceptions(scala.Function0<T> f)
Execute the given block, logging and re-throwing any uncaught exception.
|
static int |
MAX_DIR_CREATION_ATTEMPTS() |
static long |
median(long[] sizes,
boolean alreadySorted)
Return the median number of a long array
|
static String |
megabytesToString(long megabytes)
Convert a quantity in megabytes to a human-readable string such as "4.0 MiB".
|
static int |
memoryStringToMb(String str)
Convert a Java memory parameter passed to -Xmx (such as 300m or 1g) to a number of mebibytes.
|
static String |
msDurationToString(long ms)
Returns a human-readable string representing a duration such as "35ms"
|
static String[] |
nonLocalPaths(String paths,
boolean testWindows)
Return all non-local paths from a comma-separated list of paths.
|
static int |
nonNegativeHash(Object obj) |
static int |
nonNegativeMod(int x,
int mod) |
static String |
offsetBytes(scala.collection.Seq<java.io.File> files,
scala.collection.Seq<Object> fileLengths,
long start,
long end)
Return a string containing data across a set of files.
|
static String |
offsetBytes(String path,
long length,
long start,
long end)
Return a string containing part of a file from byte 'start' to 'end'.
|
static void |
org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) |
static org.slf4j.Logger |
org$apache$spark$internal$Logging$$log_() |
static scala.Tuple2<String,Object> |
parseHostPort(String hostPort) |
static String[] |
parseStandaloneMasterUrls(String masterUrls)
Split the comma delimited string of master URLs into a list.
|
static int |
portMaxRetries(SparkConf conf)
Maximum number of retries when binding to a port before giving up.
|
static Thread |
processStreamByLine(String threadName,
java.io.InputStream inputStream,
scala.Function1<String,scala.runtime.BoxedUnit> processLine)
Return and start a daemon thread that processes the content of the input stream line by line.
|
static java.util.Random |
random() |
static <T> scala.collection.Seq<T> |
randomize(scala.collection.TraversableOnce<T> seq,
scala.reflect.ClassTag<T> evidence$1)
Shuffle the elements of a collection into a random order, returning the
result in a new collection.
|
static <T> Object |
randomizeInPlace(Object arr,
java.util.Random rand)
Shuffle the elements of an array into a random order, modifying the
original array.
|
static java.io.File[] |
recursiveList(java.io.File f)
Lists files recursively.
|
static scala.collection.Seq<scala.Tuple2<String,String>> |
redact(scala.collection.Map<String,String> kvs)
Looks up the redaction regex from within the key value pairs and uses it to redact the rest
of the key value pairs.
|
static <K,V> scala.collection.Seq<scala.Tuple2<K,V>> |
redact(scala.Option<scala.util.matching.Regex> regex,
scala.collection.Seq<scala.Tuple2<K,V>> kvs)
Redact the sensitive values in the given map.
|
static String |
redact(scala.Option<scala.util.matching.Regex> regex,
String text)
Redact the sensitive information in the given string.
|
static scala.collection.Seq<scala.Tuple2<String,String>> |
redact(SparkConf conf,
scala.collection.Seq<scala.Tuple2<String,String>> kvs)
Redact the sensitive values in the given map.
|
static scala.collection.Seq<String> |
redactCommandLineArgs(SparkConf conf,
scala.collection.Seq<String> commands) |
static java.net.URI |
resolveURI(String path)
Return a well-formed URI for the file described by a user input string.
|
static String |
resolveURIs(String paths)
Resolve a comma-separated list of paths.
|
static boolean |
responseFromBackup(String msg)
Return true if the response message is sent from a backup Master on standby.
|
static String |
sanitizeDirName(String str) |
static <T> byte[] |
serialize(T o)
Serialize an object using Java serialization
|
static void |
serializeViaNestedStream(java.io.OutputStream os,
SerializerInstance ser,
scala.Function1<SerializationStream,scala.runtime.BoxedUnit> f)
Serialize via nested stream using specific serializer
|
static void |
setCustomHostname(String hostname)
Allow setting a custom host name because when we run on Mesos we need to use the same
hostname it reports to the master.
|
static void |
setLogLevel(org.apache.logging.log4j.Level l)
configure a new log4j level
|
static scala.collection.Seq<String> |
sparkJavaOpts(SparkConf conf,
scala.Function1<String,Object> filterKey)
Convert all spark properties set in the given SparkConf to a sequence of java options.
|
static scala.collection.Seq<String> |
splitCommandString(String s)
Split a string of potentially quoted arguments from the command line the way that a shell
would do it to determine arguments to a command.
|
static <T> scala.Tuple2<T,Object> |
startServiceOnPort(int startPort,
scala.Function1<Object,scala.Tuple2<T,Object>> startService,
SparkConf conf,
String serviceName)
Attempt to start a service on the given port, or fail after a number of attempts.
|
static int |
stringHalfWidth(String str)
Return the number of half widths in a given string.
|
static scala.collection.Seq<String> |
stringToSeq(String str) |
static String |
substituteAppId(String opt,
String appId)
Replaces all the {{APP_ID}} occurrences with the App Id.
|
static String |
substituteAppNExecIds(String opt,
String appId,
String execId)
Replaces all the {{EXECUTOR_ID}} occurrences with the Executor Id
and {{APP_ID}} occurrences with the App Id.
|
static void |
symlink(java.io.File src,
java.io.File dst)
Creates a symlink.
|
static java.io.File |
tempFileWith(java.io.File path)
Returns a path of temporary file which is in the same directory with
path . |
static scala.Option<Object> |
terminateProcess(Process process,
long timeoutMs)
Terminates a process waiting for at most the specified duration.
|
static long |
timeIt(int numIters,
scala.Function0<scala.runtime.BoxedUnit> f,
scala.Option<scala.Function0<scala.runtime.BoxedUnit>> prepare)
Timing method based on iterations that permit JVM JIT optimization.
|
static void |
times(int numIters,
scala.Function0<scala.runtime.BoxedUnit> f)
Method executed for repeating a task for side effects.
|
static long |
timeStringAsMs(String str)
Convert a time parameter such as (50s, 100ms, or 250us) to milliseconds for internal use.
|
static long |
timeStringAsSeconds(String str)
Convert a time parameter such as (50s, 100ms, or 250us) to seconds for internal use.
|
static <T> scala.Tuple2<T,Object> |
timeTakenMs(scala.Function0<T> body)
Records the duration of running `body`.
|
static <T> scala.util.Try<T> |
tryLog(scala.Function0<T> f)
Executes the given block in a Try, logging any uncaught exceptions.
|
static void |
tryLogNonFatalError(scala.Function0<scala.runtime.BoxedUnit> block)
Executes the given block.
|
static void |
tryOrExit(scala.Function0<scala.runtime.BoxedUnit> block)
Execute a block of code that evaluates to Unit, forwarding any uncaught exceptions to the
default UncaughtExceptionHandler
|
static <T> T |
tryOrIOException(scala.Function0<T> block)
Execute a block of code that returns a value, re-throwing any non-fatal uncaught
exceptions as IOException.
|
static void |
tryOrStopSparkContext(SparkContext sc,
scala.Function0<scala.runtime.BoxedUnit> block)
Execute a block of code that evaluates to Unit, stop SparkContext if there is any uncaught
exception
|
static <R extends java.io.Closeable,T> |
tryWithResource(scala.Function0<R> createResource,
scala.Function1<R,T> f) |
static <T> T |
tryWithSafeFinally(scala.Function0<T> block,
scala.Function0<scala.runtime.BoxedUnit> finallyBlock)
Execute a block of code, then a finally block, but if exceptions happen in
the finally block, do not suppress the original exception.
|
static <T> T |
tryWithSafeFinallyAndFailureCallbacks(scala.Function0<T> block,
scala.Function0<scala.runtime.BoxedUnit> catchBlock,
scala.Function0<scala.runtime.BoxedUnit> finallyBlock)
Execute a block of code and call the failure callbacks in the catch block.
|
static void |
unpack(java.io.File source,
java.io.File dest)
Unpacks an archive file into the specified directory.
|
static scala.collection.Seq<java.io.File> |
unzipFilesFromFile(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path dfsZipFile,
java.io.File localDir)
Decompress a zip file into a local dir.
|
static void |
updateSparkConfigFromProperties(SparkConf conf,
scala.collection.Map<String,String> properties)
Updates Spark config with properties from a set of Properties.
|
static int |
userPort(int base,
int offset)
Returns the user port to try when trying to bind a service.
|
static void |
validateURL(java.net.URI uri)
Validate that a given URI is actually a valid URL as well.
|
static String |
weakIntern(String s)
String interning to reduce the memory usage.
|
static scala.util.matching.Regex |
windowsDrive()
Pattern for matching a Windows drive, which contains only a single alphabet character.
|
static <T> T |
withContextClassLoader(ClassLoader ctxClassLoader,
scala.Function0<T> fn)
Run a segment of code using a different context class loader in the current thread
|
static <T> T |
withDummyCallSite(SparkContext sc,
scala.Function0<T> body)
To avoid calling
Utils.getCallSite for every single RDD we create in the body,
set a dummy call site that RDDs use instead. |
static void |
writeByteBuffer(java.nio.ByteBuffer bb,
java.io.DataOutput out)
Primitive often used when writing
ByteBuffer to DataOutput |
static void |
writeByteBuffer(java.nio.ByteBuffer bb,
java.io.OutputStream out)
Primitive often used when writing
ByteBuffer to OutputStream |
public static java.util.Random random()
public static int DEFAULT_DRIVER_MEM_MB()
public static int MAX_DIR_CREATION_ATTEMPTS()
public static String LOCAL_SCHEME()
public static <T> byte[] serialize(T o)
public static <T> T deserialize(byte[] bytes)
public static <T> T deserialize(byte[] bytes, ClassLoader loader)
public static long deserializeLongValue(byte[] bytes)
org.apache.spark.api.python.PythonPartitioner
)public static void serializeViaNestedStream(java.io.OutputStream os, SerializerInstance ser, scala.Function1<SerializationStream,scala.runtime.BoxedUnit> f)
public static void deserializeViaNestedStream(java.io.InputStream is, SerializerInstance ser, scala.Function1<DeserializationStream,scala.runtime.BoxedUnit> f)
public static String weakIntern(String s)
public static ClassLoader getSparkClassLoader()
public static ClassLoader getContextOrSparkClassLoader()
This should be used whenever passing a ClassLoader to Class.ForName or finding the currently active loader when setting up ClassLoader delegation chains.
public static boolean classIsLoadable(String clazz)
public static <C> Class<C> classForName(String className, boolean initialize, boolean noSparkClassLoader)
className
- (undocumented)initialize
- (undocumented)noSparkClassLoader
- (undocumented)public static <T> T withContextClassLoader(ClassLoader ctxClassLoader, scala.Function0<T> fn)
ctxClassLoader
- (undocumented)fn
- (undocumented)public static void writeByteBuffer(java.nio.ByteBuffer bb, java.io.DataOutput out)
ByteBuffer
to DataOutput
bb
- (undocumented)out
- (undocumented)public static void writeByteBuffer(java.nio.ByteBuffer bb, java.io.OutputStream out)
ByteBuffer
to OutputStream
bb
- (undocumented)out
- (undocumented)public static boolean chmod700(java.io.File file)
chmod 700 file
.
file
- the file whose permissions will be modifiedpublic static boolean createDirectory(java.io.File dir)
dir
- (undocumented)public static java.io.File createDirectory(String root, String namePrefix)
root
- (undocumented)namePrefix
- (undocumented)public static java.io.File createTempDir(String root, String namePrefix)
root
- (undocumented)namePrefix
- (undocumented)public static long copyStream(java.io.InputStream in, java.io.OutputStream out, boolean closeStreams, boolean transferToEnabled)
in
- (undocumented)out
- (undocumented)closeStreams
- (undocumented)transferToEnabled
- (undocumented)public static java.io.InputStream copyStreamUpTo(java.io.InputStream in, long maxSize)
maxSize
bytes of data from the InputStream to an in-memory
buffer, primarily to check for corruption.
This returns a new InputStream which contains the same data as the original input stream. It may be entirely on in-memory buffer, or it may be a combination of in-memory data, and then continue to read from the original stream. The only real use of this is if the original input stream will potentially detect corruption while the data is being read (e.g. from compression). This allows for an eager check of corruption in the first maxSize bytes of data.
in
- (undocumented)maxSize
- (undocumented)public static void copyFileStreamNIO(java.nio.channels.FileChannel input, java.nio.channels.WritableByteChannel output, long startPosition, long bytesToCopy)
public static String encodeFileNameToURIRawPath(String fileName)
java.net.URI(String)
.
Note: the file name must not contain "/" or "\"
fileName
- (undocumented)public static String decodeFileNameInURI(java.net.URI uri)
uri
- (undocumented)public static java.io.File fetchFile(String url, java.io.File targetDir, SparkConf conf, org.apache.hadoop.conf.Configuration hadoopConf, long timestamp, boolean useCache, boolean shouldUntar)
If useCache
is true, first attempts to fetch the file to a local cache that's shared
across executors running the same application. useCache
is used mainly for
the executors, and not in local mode.
Throws SparkException if the target file already exists and has different contents than the requested file.
If shouldUntar
is true, it untars the given url if it is a tar.gz or tgz into targetDir
.
This is a legacy behavior, and users should better use spark.archives
configuration or
SparkContext.addArchive
url
- (undocumented)targetDir
- (undocumented)conf
- (undocumented)hadoopConf
- (undocumented)timestamp
- (undocumented)useCache
- (undocumented)shouldUntar
- (undocumented)public static void unpack(java.io.File source, java.io.File dest)
org.apache.hadoop.yarn.util.FSDownload.unpack
.source
- (undocumented)dest
- (undocumented)public static <T> scala.Tuple2<T,Object> timeTakenMs(scala.Function0<T> body)
public static java.io.File doFetchFile(String url, java.io.File targetDir, String filename, SparkConf conf, org.apache.hadoop.conf.Configuration hadoopConf)
Throws SparkException if the target file already exists and has different contents than the requested file.
url
- (undocumented)targetDir
- (undocumented)filename
- (undocumented)conf
- (undocumented)hadoopConf
- (undocumented)public static void validateURL(java.net.URI uri) throws java.net.MalformedURLException
uri
- The URI to validatejava.net.MalformedURLException
public static String getLocalDir(SparkConf conf)
- If called from inside of a YARN container, this will return a directory chosen by YARN. - If the SPARK_LOCAL_DIRS environment variable is set, this will return a directory from it. - Otherwise, if the spark.local.dir is set, this will return a directory from it. - Otherwise, this will return java.io.tmpdir.
Some of these configuration options might be lists of multiple paths, but this method will always return a single directory. The return directory is chosen randomly from the array of directories it gets from getOrCreateLocalRootDirs.
conf
- (undocumented)public static boolean isInRunningSparkTask()
public static String[] getConfiguredLocalDirs(SparkConf conf)
conf
- (undocumented)public static <T> scala.collection.Seq<T> randomize(scala.collection.TraversableOnce<T> seq, scala.reflect.ClassTag<T> evidence$1)
seq
- (undocumented)evidence$1
- (undocumented)public static <T> Object randomizeInPlace(Object arr, java.util.Random rand)
arr
- (undocumented)rand
- (undocumented)public static void setCustomHostname(String hostname)
hostname
- (undocumented)public static String localCanonicalHostName()
public static String localHostName()
public static String localHostNameForURI()
public static void checkHost(String host)
host
- (undocumented)public static void checkHostPort(String hostPort)
public static scala.Tuple2<String,Object> parseHostPort(String hostPort)
public static String getUsedTimeNs(long startTimeNs)
startTimeNs
- - a timestamp in nanoseconds returned by System.nanoTime
.public static java.io.File[] recursiveList(java.io.File f)
f
- (undocumented)public static void deleteRecursively(java.io.File file)
file
- (undocumented)public static boolean doesDirectoryContainAnyNewFiles(java.io.File dir, long cutoff)
dir
- must be the path to a directory, or IllegalArgumentException is throwncutoff
- measured in seconds. Returns true if there are any files or directories in the
given directory whose last modified time is later than this many seconds agopublic static long timeStringAsMs(String str)
str
- (undocumented)public static long timeStringAsSeconds(String str)
str
- (undocumented)public static long byteStringAsBytes(String str)
If no suffix is provided, the passed number is assumed to be in bytes.
str
- (undocumented)public static long byteStringAsKb(String str)
If no suffix is provided, the passed number is assumed to be in kibibytes.
str
- (undocumented)public static long byteStringAsMb(String str)
If no suffix is provided, the passed number is assumed to be in mebibytes.
str
- (undocumented)public static long byteStringAsGb(String str)
If no suffix is provided, the passed number is assumed to be in gibibytes.
str
- (undocumented)public static int memoryStringToMb(String str)
str
- (undocumented)public static String bytesToString(long size)
size
- (undocumented)public static String bytesToString(scala.math.BigInt size)
public static String msDurationToString(long ms)
ms
- (undocumented)public static String megabytesToString(long megabytes)
megabytes
- (undocumented)public static Process executeCommand(scala.collection.Seq<String> command, java.io.File workingDir, scala.collection.Map<String,String> extraEnvironment, boolean redirectStderr)
command
- (undocumented)workingDir
- (undocumented)extraEnvironment
- (undocumented)redirectStderr
- (undocumented)public static String executeAndGetOutput(scala.collection.Seq<String> command, java.io.File workingDir, scala.collection.Map<String,String> extraEnvironment, boolean redirectStderr)
command
- (undocumented)workingDir
- (undocumented)extraEnvironment
- (undocumented)redirectStderr
- (undocumented)public static Thread processStreamByLine(String threadName, java.io.InputStream inputStream, scala.Function1<String,scala.runtime.BoxedUnit> processLine)
threadName
- (undocumented)inputStream
- (undocumented)processLine
- (undocumented)public static void tryOrExit(scala.Function0<scala.runtime.BoxedUnit> block)
NOTE: This method is to be called by the spark-started JVM process.
block
- (undocumented)public static void tryOrStopSparkContext(SparkContext sc, scala.Function0<scala.runtime.BoxedUnit> block)
NOTE: This method is to be called by the driver-side components to avoid stopping the user-started JVM process completely; in contrast, tryOrExit is to be called in the spark-started JVM process .
sc
- (undocumented)block
- (undocumented)public static <T> T tryOrIOException(scala.Function0<T> block)
block
- (undocumented)public static void tryLogNonFatalError(scala.Function0<scala.runtime.BoxedUnit> block)
public static <T> T tryWithSafeFinally(scala.Function0<T> block, scala.Function0<scala.runtime.BoxedUnit> finallyBlock)
This is primarily an issue with finally { out.close() }
blocks, where
close needs to be called to clean up out
, but if an exception happened
in out.write
, it's likely out
may be corrupted and out.close
will
fail as well. This would then suppress the original/likely more meaningful
exception from the original out.write
call.
block
- (undocumented)finallyBlock
- (undocumented)public static <T> T tryWithSafeFinallyAndFailureCallbacks(scala.Function0<T> block, scala.Function0<scala.runtime.BoxedUnit> catchBlock, scala.Function0<scala.runtime.BoxedUnit> finallyBlock)
This is primarily an issue with catch { abort() }
or finally { out.close() }
blocks,
where the abort/close needs to be called to clean up out
, but if an exception happened
in out.write
, it's likely out
may be corrupted and abort
or out.close
will
fail as well. This would then suppress the original/likely more meaningful
exception from the original out.write
call.
block
- (undocumented)catchBlock
- (undocumented)finallyBlock
- (undocumented)public static org.apache.spark.util.CallSite getCallSite(scala.Function1<String,Object> skipClass)
skipClass
- Function that is used to exclude non-user-code classes.public static long getFileLength(java.io.File file, SparkConf workConf)
file
- (undocumented)workConf
- (undocumented)public static String offsetBytes(String path, long length, long start, long end)
public static String offsetBytes(scala.collection.Seq<java.io.File> files, scala.collection.Seq<Object> fileLengths, long start, long end)
startIndex
and endIndex
is based on the cumulative size of all the files take in
the given order. See figure below for more details.files
- (undocumented)fileLengths
- (undocumented)start
- (undocumented)end
- (undocumented)public static <T> T clone(T value, SerializerInstance serializer, scala.reflect.ClassTag<T> evidence$2)
value
- (undocumented)serializer
- (undocumented)evidence$2
- (undocumented)public static scala.collection.Seq<String> splitCommandString(String s)
s
- (undocumented)public static int nonNegativeMod(int x, int mod)
public static int nonNegativeHash(Object obj)
public static scala.collection.Map<String,String> getSystemProperties()
public static void times(int numIters, scala.Function0<scala.runtime.BoxedUnit> f)
numIters
- (undocumented)f
- (undocumented)public static long timeIt(int numIters, scala.Function0<scala.runtime.BoxedUnit> f, scala.Option<scala.Function0<scala.runtime.BoxedUnit>> prepare)
numIters
- number of iterationsf
- function to be executed. If prepare is not None, the running time of each call to f
must be an order of magnitude longer than one nanosecond for accurate timing.prepare
- function to be executed before each call to f. Its running time doesn't count.public static long getIteratorSize(scala.collection.Iterator<?> iterator)
TraversableOnce.size()
because it uses a for loop, which is slightly slower
in the current version of Scala.iterator
- (undocumented)public static <T> scala.collection.Iterator<scala.Tuple2<T,Object>> getIteratorZipWithIndex(scala.collection.Iterator<T> iter, long startIndex)
iter
- (undocumented)startIndex
- (undocumented)public static void symlink(java.io.File src, java.io.File dst)
src
- absolute path to the sourcedst
- relative path for the destinationpublic static String getFormattedClassName(Object obj)
public static org.apache.hadoop.fs.FileSystem getHadoopFileSystem(java.net.URI path, org.apache.hadoop.conf.Configuration conf)
path
- (undocumented)conf
- (undocumented)public static org.apache.hadoop.fs.FileSystem getHadoopFileSystem(String path, org.apache.hadoop.conf.Configuration conf)
path
- (undocumented)conf
- (undocumented)public static boolean isWindows()
public static boolean isMac()
public static boolean isMacOnAppleSilicon()
public static scala.util.matching.Regex windowsDrive()
public static boolean isTesting()
public static scala.Option<Object> terminateProcess(Process process, long timeoutMs)
process
- (undocumented)timeoutMs
- (undocumented)public static scala.Option<String> getStderr(Process process, long timeoutMs)
process
- (undocumented)timeoutMs
- (undocumented)public static <T> T logUncaughtExceptions(scala.Function0<T> f)
f
- (undocumented)public static <T> scala.util.Try<T> tryLog(scala.Function0<T> f)
public static boolean isFatalError(Throwable e)
public static java.net.URI resolveURI(String path)
If the supplied path does not contain a scheme, or is a relative path, it will be converted into an absolute path with a file:// scheme.
path
- (undocumented)public static String resolveURIs(String paths)
public static boolean isAbsoluteURI(String path)
public static String[] nonLocalPaths(String paths, boolean testWindows)
public static String loadDefaultSparkProperties(SparkConf conf, String filePath)
conf
- (undocumented)filePath
- (undocumented)public static void updateSparkConfigFromProperties(SparkConf conf, scala.collection.Map<String,String> properties)
conf
- (undocumented)properties
- (undocumented)public static scala.collection.Map<String,String> getPropertiesFromFile(String filename)
public static String getDefaultPropertiesFile(scala.collection.Map<String,String> env)
public static String exceptionString(Throwable e)
e
- (undocumented)public static ThreadStackTrace[] getThreadDump()
public static scala.Option<ThreadStackTrace> getThreadDumpForThread(long threadId)
public static scala.collection.Seq<String> sparkJavaOpts(SparkConf conf, scala.Function1<String,Object> filterKey)
conf
- (undocumented)filterKey
- (undocumented)public static int portMaxRetries(SparkConf conf)
conf
- (undocumented)public static int userPort(int base, int offset)
base
- (undocumented)offset
- (undocumented)public static <T> scala.Tuple2<T,Object> startServiceOnPort(int startPort, scala.Function1<Object,scala.Tuple2<T,Object>> startService, SparkConf conf, String serviceName)
startPort
- The initial port to start the service on.startService
- Function to start service on a given port.
This is expected to throw java.net.BindException on port collision.conf
- A SparkConf used to get the maximum number of retries when binding to a port.serviceName
- Name of the service.public static boolean isBindCollision(Throwable exception)
exception
- (undocumented)public static void setLogLevel(org.apache.logging.log4j.Level l)
l
- (undocumented)public static String libraryPathEnvName()
public static String libraryPathEnvPrefix(scala.collection.Seq<String> libraryPaths)
libraryPaths
- (undocumented)public static String getSparkOrYarnConfig(SparkConf conf, String key, String default_)
conf
- (undocumented)key
- (undocumented)default_
- (undocumented)public static scala.Tuple2<String,Object> extractHostPortFromSparkUrl(String sparkUrl) throws SparkException
sparkUrl
.
A spark url (spark://host:port
) is a special URI that its scheme is spark
and only contains
host and port.
sparkUrl
- (undocumented)SparkException
- if sparkUrl is invalid.public static String getCurrentUserName()
SPARK_USER
environment variable.public static scala.collection.immutable.Set<String> EMPTY_USER_GROUPS()
public static scala.collection.immutable.Set<String> getCurrentUserGroups(SparkConf sparkConf, String username)
public static String[] parseStandaloneMasterUrls(String masterUrls)
masterUrls
- (undocumented)public static String BACKUP_STANDALONE_MASTER_PREFIX()
public static boolean responseFromBackup(String msg)
public static <T> T withDummyCallSite(SparkContext sc, scala.Function0<T> body)
Utils.getCallSite
for every single RDD we create in the body,
set a dummy call site that RDDs use instead. This is for performance optimization.sc
- (undocumented)body
- (undocumented)public static boolean isInDirectory(java.io.File parent, java.io.File child)
parent
- (undocumented)child
- (undocumented)public static boolean isLocalMaster(SparkConf conf)
conf
- (undocumented)public static boolean isPushBasedShuffleEnabled(SparkConf conf, boolean isDriver, boolean checkSerializer)
conf
- (undocumented)isDriver
- (undocumented)checkSerializer
- (undocumented)public static <T> T instantiateSerializerOrShuffleManager(String className, SparkConf conf, boolean isDriver)
public static <T> T instantiateSerializerFromConf(org.apache.spark.internal.config.ConfigEntry<String> propertyName, SparkConf conf, boolean isDriver)
public static boolean isDynamicAllocationEnabled(SparkConf conf)
conf
- (undocumented)public static boolean isStreamingDynamicAllocationEnabled(SparkConf conf)
public static int getDynamicAllocationInitialExecutors(SparkConf conf)
conf
- (undocumented)public static <R extends java.io.Closeable,T> T tryWithResource(scala.Function0<R> createResource, scala.Function1<R,T> f)
public static java.io.File tempFileWith(java.io.File path)
path
.path
- (undocumented)public static String getProcessName()
public static void initDaemon(org.slf4j.Logger log)
main()
for daemons to set up some common
diagnostic state.log
- (undocumented)public static scala.collection.Seq<String> getUserJars(SparkConf conf)
conf
- (undocumented)public static scala.collection.Seq<String> getLocalUserJarsForShell(SparkConf conf)
conf
- (undocumented)public static scala.collection.Seq<scala.Tuple2<String,String>> redact(SparkConf conf, scala.collection.Seq<scala.Tuple2<String,String>> kvs)
conf
- (undocumented)kvs
- (undocumented)public static <K,V> scala.collection.Seq<scala.Tuple2<K,V>> redact(scala.Option<scala.util.matching.Regex> regex, scala.collection.Seq<scala.Tuple2<K,V>> kvs)
regex
- (undocumented)kvs
- (undocumented)public static String redact(scala.Option<scala.util.matching.Regex> regex, String text)
regex
- (undocumented)text
- (undocumented)public static scala.collection.Seq<scala.Tuple2<String,String>> redact(scala.collection.Map<String,String> kvs)
kvs
- (undocumented)public static scala.collection.Seq<String> redactCommandLineArgs(SparkConf conf, scala.collection.Seq<String> commands)
public static scala.collection.Seq<String> stringToSeq(String str)
public static <T> scala.collection.Seq<T> loadExtensions(Class<T> extClass, scala.collection.Seq<String> classes, SparkConf conf)
The classes in the given list must: - Be sub-classes of the given base class. - Provide either a no-arg constructor, or a 1-arg constructor that takes a SparkConf.
The constructors are allowed to throw "UnsupportedOperationException" if the extension does not want to be registered; this allows the implementations to check the Spark configuration (or other state) and decide they do not need to be added. A log message is printed in that case. Other exceptions are bubbled up.
extClass
- (undocumented)classes
- (undocumented)conf
- (undocumented)public static String checkAndGetK8sMasterUrl(String rawMasterURL)
rawMasterURL
- (undocumented)public static String substituteAppNExecIds(String opt, String appId, String execId)
opt
- (undocumented)appId
- (undocumented)execId
- (undocumented)public static String substituteAppId(String opt, String appId)
opt
- (undocumented)appId
- (undocumented)public static String createSecret(SparkConf conf)
public static boolean isMemberClass(Class<?> cls)
Note: jdk8u throws a "Malformed class name" error if a given class is a deeply-nested inner class (See SPARK-34607 for details). This issue has already been fixed in jdk9+, so we can remove this helper method safely if we drop the support of jdk8u.
cls
- (undocumented)public static String getSimpleName(Class<?> cls)
cls
- (undocumented)public static int stringHalfWidth(String str)
For a string consisting of 1 million characters, the execution of this method requires about 50ms.
str
- (undocumented)public static String sanitizeDirName(String str)
public static boolean isClientMode(SparkConf conf)
public static boolean isLocalUri(String uri)
public static boolean isFileSplittable(org.apache.hadoop.fs.Path path, org.apache.hadoop.io.compress.CompressionCodecFactory codecFactory)
public static java.util.Properties cloneProperties(java.util.Properties props)
public static String buildLocationMetadata(scala.collection.Seq<org.apache.hadoop.fs.Path> paths, int stopAppendingThreshold)
Path
s to a metadata string. When the length of metadata string
exceeds stopAppendingThreshold
, stop appending paths for saving memory.paths
- (undocumented)stopAppendingThreshold
- (undocumented)public static int executorOffHeapMemorySizeAsMb(SparkConf sparkConf)
sparkConf
- (undocumented)public static long checkOffHeapEnabled(SparkConf sparkConf, long offHeapSize)
sparkConf
- (undocumented)offHeapSize
- (undocumented)public static String createFailedToGetTokenMessage(String serviceName, Throwable e)
public static scala.collection.Seq<java.io.File> unzipFilesFromFile(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path dfsZipFile, java.io.File localDir)
fs
- (undocumented)dfsZipFile
- (undocumented)localDir
- (undocumented)public static long median(long[] sizes, boolean alreadySorted)
sizes
- alreadySorted
- public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)