Topic modelling of articles collected from NY Times. Scraped NY Times data and preprocessed using NLTK. Used Naive Bayes Classifier and Random Forests libraries in Apache Spark to predict based on TF-IDF features.


https://github.com/AshVijay/NYT_PySpark