pyspark.sql.streaming.DataStreamReader.load#
- DataStreamReader.load(path=None, format=None, schema=None, **options)[source]#
Loads a data stream from a data source and returns it as a
DataFrame
.New in version 2.0.0.
Changed in version 3.5.0: Supports Spark Connect.
- Parameters
- pathstr, optional
optional string for file-system backed data sources.
- formatstr, optional
optional string for format of the data source. Default to ‘parquet’.
- schema
pyspark.sql.types.StructType
or str, optional optional
pyspark.sql.types.StructType
for the input schema or a DDL-formatted string (For examplecol0 INT, col1 DOUBLE
).- **optionsdict
all other string options
Notes
This API is evolving.
Examples
Load a data stream from a temporary JSON file.
>>> import tempfile >>> import time >>> with tempfile.TemporaryDirectory(prefix="load") as d: ... # Write a temporary JSON file to read it. ... spark.createDataFrame( ... [(100, "Hyukjin Kwon"),], ["age", "name"] ... ).write.mode("overwrite").format("json").save(d) ... ... # Start a streaming query to read the JSON file. ... q = spark.readStream.schema( ... "age INT, name STRING" ... ).format("json").load(d).writeStream.format("console").start() ... time.sleep(3) ... q.stop()