Every moment our sensory system is collecting a myriad of information and sending them to our brain in a format that it can understand. In a split second of looking at a scene, we can identify individual objects within our view. Even in the crowd, we can tell between people and trees, moreover, we can quickly spot faces that are familiar to us. How do we do this? Moreover, can a machine see the way we see?
The basic mechanism of our vision is as follows: lights from the external visual field enter our eyes. These lights are projected to…
I have been working on an exploratory project to build a conceptual description model that generates interpretation for artworks. To begin, I started using a CNN-LSTM architecture that can work as a simple caption generator. In this post, I will describe how to build a basic CNN-LSTM architecture to create a model that can output text caption based on images. Please stay tuned for more on my project later.
The main approach to this image captioning is in three parts: 1. to use a pre-trained object-recognition network to get features from images and 2. to map these extracted feature embeddings…
Creating presentable plots in Python can be a bit daunting. It’s especially so if you are used to making your visualizations using other BI software or even R, where most plots come already prettified for you. Another problem is that there are many ways things can go wrong and ways to resolve the issue will depend on the choices you made for the plot. Here, I will demonstrate a few ways to easily create plots in Python for the various scenarios, and show you how to resolve some of the issues that may arise in each case.
In this post…
Processing a large amount of data on a local machine can be a hassle sometimes, especially if our primary purpose is rather exploratory. I ran into this problem when I wanted to see the overall trend of open policing data.
The Stanford Open Policing Project gathers data on vehicular and pedestrian stops made by the police across the country. They offer a very well-organized series of data divided by different locations. …
For Data Scientists, Pandas and Numpy are both essential tools in Python. We know Numpy runs vector and matrix operations very efficiently, while Pandas provides the R-like data frames allowing intuitive tabular data analysis. A consensus is that Numpy is more optimized for arithmetic computations. Is this always the case? I decided to put them to the test.
First, I tested the vector array in Numpy with Pandas Series along with the Python list object. Then I also tested the matrix in Numpy with Pandas Dataframe and the nested list object. The list object’s performance was generally not comparable, so…
This is a neuron. It’s a nerve cell. Neurons are the fundamental units of the human brain, and our brain has about 90 billion of them. Neurons are responsible for all connections to, from, and within the central nervous system (brain & spinal cord). Neurons communicate with other cells by sending and receiving electrical signals through synapses. Today, we are going to focus on this connective element, the synapse.
Now the interesting thing here is that synapses do change based on the adjacent neurons' activities. If there are more repeated and persistent signals that get transmitted between specific neurons, this…
We all have that love and hate relationship with the database, more specifically the data management system (DBMS). It’s an integral part that defines how to access one of the most valuable assets of the 21st century, and it drives many of our passion and determines our workflow. But it also is arguably one of the most tedious parts of the data science process. In this post, I will discuss some of the fundamental concepts behind the database and tie a few popular terms we hear around the data world.
The goal of this post is to identify some of…
Longitude and latitude are great measures to obtain relatively precise locations of the observations, but oftentimes these coordinates alone do not tell enough stories. How can we easily convert these measures to something more meaningful?
We can do many things with these coordinates using the Geopy library. Geopy can calculate the distance between different points, it can also make an API call on our behalf to get information about these coordinates.
First, we will talk about getting information such as postal code (zip code), city, neighborhoods, etc. Geopy can make API requests to Nominatim (a geocoding software used for OpenStreetMap)…
In this post, I will explain a few basic machine learning approaches in classifying tweet sentiment and how to run them in Python.
Sentiment analysis is used to identify the affect or emotion (positive, negative, or neutral) of the data. For a business, it is a simple way to determine customers’ reactions towards the product or service and to quickly pick up on any change of emotion that may require immediate attention. The most basic approach to this problem is to use supervised learning. We can have actual humans to determine and label the sentiment of our data and treat…
In this blog, I will outline how to build a simple image classification model using a convolutional neural network to detect the presence of pneumonia from chest X-ray images.
Pneumonia is a common infection that inflames the air sacs in the lungs causing symptoms such as difficulty breathing and fever. Even though pneumonia is not difficult to treat, a timely diagnosis is crucial. Without proper treatment, pneumonia can become fatal especially among children and elders. Chest X-ray is an affordable method for the diagnosis of pneumonia. …