Spark large table joinRecently I was joining two very large tables (dozens of TBs of shuffle write), then it almost always report OOM (18.1G mem has been used…Oct 24, 2021Oct 24, 2021
Parquet column compressionIf a DF has two columns, with one to multiple connection, it’s natural to consider using group by to compress the data frame when…Mar 19, 2021Mar 19, 2021
NBA KnapsackSuppose a team want to optimize its winning opportunity in playoff, he knows its winning probability over every team, then he wants a…Mar 4, 2020Mar 4, 2020
Bayesian settingIt is hard to imaging only after four years I understand what the “Bayesian setting” means. Take from the mixture model item from…Oct 8, 2019Oct 8, 2019
Understanding “Boosting for Comparison-Based Learning”Suppose in real-world, we are given movie-user matrix, and a few tags for some of the movies, such as horror and sci-fi, other are either…Aug 20, 2019Aug 20, 2019
TensorFlow 1.13.1:In my case, the issue was that the location of libcublas changed with cuda 10.1 and needed me to update my LD_LIBRARY_PATHMay 19, 2019May 19, 2019