Vectorized UDF: Scalable Analysis with Python and PySpark Li Jin (Two Sigma) from pandas numpy Watch Video
Preview(s):
Gallery
Play Video: (Note: The default playback of the video is HD VERSION. If your browser is buffering the video slowly, please play the REGULAR MP4 VERSION or Open The Video below for better experience. Thank you!)
⏲ Duration: 17 min 50 sec ✓ Published: 11-Jun-2018
Description: Over the past few years, Python has become the default language for data scientists. Packages such as pandas, numpy, statsmodel, and scikit-learn have gained great adoption and become the mainstream toolkits. At the same time, Apache Spark has become the de facto standard in processing big data. Spark ships with a Python interface, aka PySpark, however, because Spark’s runtime is implemented on top of JVM, using PySpark with native Python library sometimes results in poor performance and usabi
Play Video: (Note: The default playback of the video is HD VERSION. If your browser is buffering the video slowly, please play the REGULAR MP4 VERSION or Open The Video below for better experience. Thank you!)