Pauline C

Pauline C



To build a ML model, I follow CRISP-DM process (CRoss Industry Standard Process for Data Mining), a process every data scientist should know. There’s 6 steps, explained in that thread with examples:

1. Business understanding What are your objectives? What problem do you want to solve? What is the context? It's questions time! Ask a lot of questions to client or to yourself. Ex: “Nobody likes my tweet, I would like to predict if a tweet will generate me like”

2. Data understanding What data do you have? Do you understand every data? Explore, assess the quality. Also, ask a lot of questions. Think relations between data and your objective. « I have a tweets collection with number of retweets, likes, author name, author id, time… »

3. Data preparation Select useful data. Clean them. Prepare data for modeling, add new columns / features. It's exploration time :). "I don’t need the author name and date. I will drop these information" NB: this example can be challenged 🙃

4. Modeling  Build, test algorithms, adjust parameters. Do you need predictive, classification, NLP...? "I can try to extract the topic of my tweets, compare it to my collection, […], then apply a predictive model […]"

5. Evaluation  Assess your models. Do they answer business needs? Re-iterate if not. "My predictions give a number of retweets. I would rather get the number of likes, I need to change my model"

6. Deployment  When your model is good and solve the business problem, deploy it. "It works well on my machine, now I want to access my service from everywhere and impact the world 😍" #PaulineDataCommunity

Follow us on Twitter

to be informed of the latest developments and updates!

You can easily use to @tivitikothread bot for create more readable thread!
Donate 💲

You can keep this app free of charge by supporting 😊

for server charges...