Connect Jupyter to Remote Spark Clusters With Apache Toree
Scala [https://www.scala-lang.org/] is a fun language which gives you all the
power of Java [https://www.
Transpose data with Spark
A short user defined function written in Scala which allows you to transpose a dataframe without performing aggregation functions.
Convert Spark Vectors to DataFrame Columns
Vectors are typically required for Machine Learning tasks, but are otherwise not
commonly used. Sometimes you end up with an
Pivoting data with Spark
One of the common data engineering tasks is taking a deep dataset and turning
into a wide dataset with some
Renaming All Columns In A Spark DataFrame
Here's an easy example of how to rename all columns in an Apache Spark
DataFrame. Tehcnically, we'
Using Spark, Scala and XGBoost On The Titanic Dataset from Kaggle
The Titanic: Machine Learning from Disaster [https://www.kaggle.com/c/titanic]
competition on Kaggle [https://www.kaggle.com/] is
List All Additional Jars Loaded in Spark
Once in a while, you need to verify the versions of your jars which have been
loaded into your Spark
Spark Vector of Vectors
I recently ran into a problem with creating a features vector for a machine
learning project. If the number of
Joining Spark DataFrames Without Duplicate or Ambiguous Column Names
When performing joins in Spark, one question keeps coming up: When joining
multiple dataframes, how do you prevent ambiguous column
Selecting Dynamic Columns In Spark DataFrames (aka Excluding Columns)
I often need to perform an inverse selection of columns in a dataframe, or
exclude some columns from a query.