Photo by Jonathan Pielmayer on Unsplash

What is RFM analysis?

RFM analysis is a customer behavior segmentation technique. Based on customers’ historical transactions, RFM analysis focuses on 3 main aspects of customers’ transactions: recency, frequency and purchase amount. Understanding these behaviors will allow businesses to cluster different customers into groups.

How do I apply RFM analysis?

Here is the dataset of a store whose customers are coming from all over the world. It includes information such as invoice number, invoice date, customer id, stock code, description of the product, purchased quantity and country where the customer lives. In this article, I will only show you my method of doing the RFM analysis, and not my steps…

Photo by Markus Spiske on Unsplash


When I first learned SQL, I had a problem of differentiating between PARTITION BY and GROUP BY, as they both have a function for grouping. I believe many people who begin to work with SQL may encounter the same problem. Therefore, in this article I want to share with you some examples of using PARTITION BY, and the difference between it and GROUP BY in a select statement.


Firstly, I create a simple dataset with 4 columns. …

Photo by Jim Kalligas on Unsplash


Subset selection is one of the most frequently performed steps in data manipulation. Pandas by far offers many different ways to filter your dataframes to get your selected subsets of data. In this article, I will show you some cases that I encounter the most when manipulating data.

Before coming to details, I will first create a sample dataframe.

#Create a simple dataframedf = pd.DataFrame({ 'name' : [ 'Chi', 'Alex', 'Sam', 'Hoang', 'Linh', 'Dung', 'Anh'], 'function' : [ 'Marketing', 'Tech', 'Tech', 'Finance', 'Finance', 'Marketing', 'HR'], 'address' : [ 'Hanoi', 'Saigon', 'Hanoi', 'Saigon', 'Hanoi', 'Hanoi', 'Saigon'], 'gender' : ['F', 'M'…

Photo by Manja Vitolic on Unsplash


In my previous post, I introduced some simple visualization tips to quickly build good-looking charts with Seaborn and Matplotlib. Today, I’m gonna show you in detail how to build more complex charts, including combination charts and subplots.

Necessary Package Installation

There are some packages that we should import first.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Data Description

My dataset is collected from Kaggle public dataset and you can easily download via the following link:

Let’s take a look at our dataset. I will use data related to Germany as an example.


Photo by Soraya Irving on Unsplash


Exploratory Data Analysis — EDA is an indispensable step in data mining. To interpret various aspects of a data set like its distribution, principal or interference, it is necessary to visualize our data in different graphs or images. Fortunately, Python offers a lot of libraries to make visualization more convenient and easier than ever. Some of which are widely used today such as Matplotlib, Seaborn, Plotly or Bokeh.

Since my job concentrates on scrutinizing all angles of data, I have been exposed to many types of graphs. However, because there are way too many functions and the codes are not…

Photo by Anz Design on Unsplash

1. Problem’s Description

Hanoi is the Capital of Vietnam, and is a leading city that has a significant role in the growth of the country. Because of its dense population and its openness to new opportunities, Hanoi is an ideal place for investors and for entrepreneurs to start their businesses or make valuable investments.

However, from an investor’s perspective, it will be hard to figure out which type of business to open and in which area that business could be attractive to customers and give optimal profit to owner.

By using Data Science and exploring some geometric data of Hanoi, you can have…

At first, “Data Crawling” brought to me the impression of a difficult task that can only be carried out by experts in programming, but after a few hours of researching BeautifulSoup, I am now able to conduct some basic web scraping although I am not that good in technical skills.

Photo by Franki Chamaki on Unsplash

Scraping data is a necessary skill, especially if you are working in fields related to data and analytics. When doing different kinds of analysis, you will need to collect enough data to perform your task. Normally, you would search the Internet for CSV files or page sharing APIs to access…

There are 2 warnings that I think you should consider before reading “The Fountainhead”:

  • First, the book is incredibly thick with 1200 pages! It literally and figuratively fits a head pillow. So the book may not be suitable for people who do not have enough patience or get bored easily.

If you are a non-tech person who is looking for a way to build your own interactive dashboard, you can consider Streamlit.

So what is Streamlit? And how does it support you in creating an interactive dashboard?

Let’s find out in this 5-minute read.

What is Streamlit?

Streamlit is an open python package that helps you make deployable interactive web apps without any knowledge of HTML or CSS, etc. Python is all you need. The great thing about Streamlit is that it can automatically refresh your web apps whenever the source codes have their inputs changed.

There are numerous visualization libraries that Streamlit…

Chi Nguyen

An introverted girl who craves for learning and writing

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store