Predicting Facebook User Check Ins

Kaggle is a world-famous platform of machine learning competitions where companies and research institutes challenge Data Scientists around the world on predictive analytics problems with their dataset.

In each competition, the host provides the dataset and a description of the problem they want to solve and the evaluation metric. Participants will then submit their prediction on the test set. Scoring for a part of the test set is then displayed on the “public leaderboard,” where participants can see their current ranking in the competition.

Scoring for the rest of the test set is invisible to participants until the end of the competition, known as “private leaderboard.” The final rank is then evaluated on the private leaderboard to prevent participants from probing and overfitting so the final top solutions can generalize well. This is done to prevent any Data Scientist from “gaming the system” in order to win with a model unable to predict from unknown data but just memorize the known data.

Introducing Kaggle

People are always amazed by how professional athletes outperform normal people in their sports. Kaggle turns machine learning into a sport. There are now over one million users on Kaggle and, on average, thousands of participants in each competition. Fierce competition pushes participants to explore more in machine learning and jump out of their knowledge comfort zone.

Even very experienced Data Scientists would be surprised about how the machine learning solution is significantly improved during the competition. Top ranking Data Scientists on Kaggle, Kaggle Masters and Grandmasters, as they are called, who survived at top positions on leaderboard after several competitions, are well recognized by the applied machine learning community.

Typically, these Data Scientists are proficient in:

  • Fast prototyping on new problems, even without prior domain expertise
  • Strong willpower of trying new ideas and never (Read more...)

This is a Security Bloggers Network syndicated blog post authored by Li Li. Read the original post at: Cylance Blog