Data Scientist | Designer | Programmer | Researcher | Lifelong Learner

Getting Address, Postal Code, Distance, and More

Image for post
Image for post
Photo by delfi de la Rua on Unsplash

Longitude and latitude are great measures to obtain relatively precise locations of the observations, but oftentimes these coordinates alone do not tell enough stories. How can we easily convert these measures to something more meaningful?

Geopy

We can do many things with these coordinates using the Geopy library. Geopy can calculate the distance between different points, it can also make an API call on our behalf to get information about these coordinates.

Getting Address Information

First, we will talk about getting information such as postal code (zip code), city, neighborhoods, etc. Geopy can make API requests to Nominatim (a geocoding software used for OpenStreetMap), Google Maps, Bing Maps, and many others. One thing to note is that Geopy does not provide this info, but it simply helps the user connects to them, so it’s important for users to understand the terms, pricing, and limitations of each API. …


NLP preprocessing, BoW, TF-IDF, Naive Bayes, SVM, Spacy, Shapely, LSTM, and more

Image for post
Image for post
Photo by Chris J. Davis on Unsplash

In this post, I will explain a few basic machine learning approaches in classifying tweet sentiment and how to run them in Python.

Sentiment Analysis

Sentiment analysis is used to identify the affect or emotion (positive, negative, or neutral) of the data. For a business, it is a simple way to determine customers’ reactions towards the product or service and to quickly pick up on any change of emotion that may require immediate attention. …


Detecting Pneumonia Using Convolutional Neural Network

Image for post
Image for post
Eigenimages of normal and pneumonia X-rays

In this blog, I will outline how to build a reliable image classification model using a convolutional neural network to detect the presence of pneumonia from chest X-ray images.

Pneumonia is a common infection that inflames the air sacs in the lungs causing symptoms such as difficulty breathing and fever. …


Visualizing Patterns of Image Data

Image for post
Image for post
Photo by CDC on Unsplash

Exploratory data analysis comprises of brief analyses to describe a dataset to guide the modeling process and to answer preliminary questions. For classification problems, this might include looking at the distributions of variables or checking for any meaningful patterns of predictors across different classes. The same problem holds for the classification of image data. We intend to find meaningful information simple operations can give us. Here, I outline a couple of methods we can do to achieve this goal using Chest X-Rays data [source]. This dataset consists of X-ray images of pneumonia patients and healthy controls.

Raw Comparison

First, we can start by simply looking at a few randomly sampled images. …


Predicting conditions of water points in Tanzania with Python

Image for post
Image for post
Photo by Magdalena Kula Manchee on Unsplash

According to Water.org and Lifewater International, out of 57 million people in Tanzania, 25 million do not have access to safe water. Women and children must travel each day multiple times to gather water when the safety of that water source is not even guaranteed. In 2004, 12% of all deaths in Tanzania were due to water-borne illnesses.

Despite years of effort and large amounts of funding to resolve the water crisis in Tanzania, the problem remains. …


Selecting appropriate evaluation metrics for multiclass and binary classification problems in Python

Image for post
Image for post
Photo by Isaac Smith on Unsplash

Evaluation metric refers to a measure that we use to evaluate different models. Choosing an appropriate evaluation metric is a decision problem that requires a thorough understanding of the goal of a project and is a fundamental step before all modeling process that follows. So why is it so important, and how should we choose?

Why is it important?

Let’s say you live in a city where it is sunny about 300 days a year. During the other 65 days, it gets very uncomfortable with heavy rain or snow. …


8 Different Ways to Clean Strings in Python

Image for post
Image for post
Photo by Murat Onder on Unsplash

Recently I found myself spending many hours trying to make sense of messy text data, and decided to review some of the preprocessing involved. There are many different ways to achieve a simple cleaning step. Today, I will review a couple of different methods to remove punctuations from a string and compare their performances.

Using Translate

The string translate method is a convenient way to change multiple characters to different values at once. Translate requires a table that will work as a dictionary to map the strings. The maketrans does that job for you.

The maketrans syntax works like str.maketrans('abcd', '0123', 'xyz'). It will create a table that tells translate to change all a with 0, b with 1, c with 2, etc., …


Developing spatial ability to enhance performance in STEM

Image for post
Image for post
Photo by davisco on Unsplash

Shortly after graduating from Rhode Island School of Design (RISD), I received an alumni email from RISD about its former President John Maeda’s initiative to integrate arts into STEM (science, technology, engineering, and mathematics) education, calling it STEAM. As a designer in love with math and science, this email touched me on an emotional level.

(For New York Times coverage on John Maeda and RISD’s initiative, see here.)

Supporters of STEAM argue that creativity and problem-solving skills are intricately tied to success in STEM education. Although I do agree with this point of view, this argument can be somewhat arbitrary. For one, we don’t fully understand what creativity means, and even if we did, we know it is not necessarily equivalent to learning how to draw. …


Future of a Movie Studio

Image for post
Image for post
Photo by King Lip on Unsplash

Data visualization is a key step in any data science project. During the process of exploratory data analysis, visualizing data allows us to locate outliers and identify distribution, helping us to control for possible biases in our data earlier on. Coupled with simple statistical tests, it can also answer many of the questions and can aid us in prioritizing areas to focus on.

Here, I will go through some of the exploratory data analysis and data visualization steps in Python using Matplotlib and Seaborn libraries. …


Studying the progression of depressive symptoms using multiple linear regression

Image for post
Image for post
Photo by ORNELLA BINNI on Unsplash

According to the National Institute of Mental Health (NIMH), in 2017, suicide was the tenth leading cause of death in the US and the second among age 10 to 34. Yet, suicide is also one of the most stigmatized topics. People avoid talking about suicidal ideation, which makes it difficult to prevent the progression of depressive symptoms. It also makes it harder to learn what makes people go from feeling unmotivated or depressed to having suicidal ideation.

So what leads people from feeling down sometimes to think of suicide?

The goal of this analysis is to identify some of the key factors that attribute to the progression of depressive symptoms. I hope that this project can add some light in the future suicide prevention efforts. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store