Category: Data Science

Post Categories
Growth 4 NLP 17 Data Science 13 Script 12 Mathematics 12 Statistics 8 Speech 4 Finance 5 Deep Learning 2 DevOps 2 Environment 1 SLP 1 Tricks 1 Tools 1
                            
                            Why We Need PEP
                        
                                PEP8 was designed to make Python code more readable. If you're new to Python, remembering what a piece of code does a few days or weeks after you created it can be challenging. If you follow PEP8, you may be able to assure that your variables have been properly named.
                            
                                2021-12-15
                            
                                    Data Science
                                
                            Python
                        
                            PEP8
                        
                            Blind Spot about Sklearn Confusion Matrix
                        
                                Evaluate the model we developed while performing research for either machine learning or deep learning projects is crucial. The best technique to see if the predicted value is well-classified is to use a confusion matrix. The confusion matrix function in the sklearn package, however, has a different interpretation than the one we usually find on other websites.
                            
                                2021-10-12
                            
                                    Data Science
                                
                            Python
                        
                            Data Science
                        
                            Sklearn
                        
                            Detect Covariate Shift
                        
                                A supervised machine learning model has two phases, training and testing. When these models are learned, validated, and tested, the test and train data points are normally presumed to have the same distribution. In the real world, however, the training and test datasets rarely follow the same distribution.
                            
                                2021-05-22
                            
                                    Data Science
                                
                            Python
                        
                            ML
                        
                            Classification
                        
                            Predicting Stock Price using LSTM
                        
                                This article tends to build a model that predicts stock price in the best way possible. This is an example of how you can use Long Short-Term Memory (LSTM) Neural Network on some real-world time series data with PyTorch. Hopefully, there are much better models that forecast the price of the stock.
                            
                                2021-05-01
                            
                                    Data Science
                                
                            Python
                        
                            PyTorch
                        
                            LSTM
                        
                            NN
                        
                            Notes on Feature Engineering
                        
                                Without sufficient data and suitable features, the most powerful model structure cannot get satisfactory output. As a classic saying goes, "Garbage in, garbage out." For a machine learning problem, the data and features often determine the upper limit of the results, while the selection of models, algorithms and optimization are gradually approaching this upper limit.
                            
                                2021-03-22
                            
                                    Data Science
                                
                            Statistics
                        
                            Data Science
                        
                            Calculate the Singular Value Decomposition
                        
                                Singular Value Decomposition (SVD) is a widely used technique to decompose a matrix into several component matrices, exposing many of the useful and interesting properties of the original matrix.
                            
                                2021-03-16
                            
                                    Data Science
                                
                            Linear Algebra
                        
                            Data Science
                        
                            Visualise Crypto and Twitter with SQL and FastAPI
                        
                                In this article, first, I'll populate crypto database using Python and SQL. I retrieve the list of crypto coin assets, verify the data, and tackle any errors I encounter along the way. Second, I'll talk about how to keep the database up to date with the latest prices, and retrieve daily data from the yahoo finance API. Third, a database for twitter data will be built as well. Finally, I'll set up to build a web UI using FastAPI.
                            
                                2021-03-15
                            
                                    Data Science
                                
                            Python
                        
                            Finance
                        
                            SQL
                        
                            FastAPI
                        
                            Principal Component Analysis Derivation
                        
                                Principal Component Analysis (PCA) is an important technique to understand in the fields of statistics and data science. It is a process of computing the principal components and utilising then to perform a change of basis on the data. For the purpose of visualisation, it is very hard to visulaise and understand the data in high dimensions, this is where PCA comes to the rescue.
                            
                                2021-03-13
                            
                                    Data Science
                                
                            Data Science
                        
                            Machine Learning
                        
                            Simplest way to Build Web Crawler
                        
                                A web crawler, sometimes called a spiderbot or scraper, is an internet bot that systematically browses the net. We can get the information we need without copy-paste. The goal of this article is to let you know how I scrape web and store it into database or csv file.
                            
                                2021-01-25
                            
                                    Data Science
                                
                            Python
                        
                            Crawler
                        
                            Set Up Anaconda for Python
                        
                                Recently, python is getting more popular, because it can complete a project in a short time. However, setting up virtual environment is crucial for programming several projects. In this article, I will introduce how I setting up a anaconda environment for python.
                            
                                2020-12-31
                            
                                    Data Science
                                
                            Python
                        
                            Anaconda
                        
                            Conda
                        
                            EDA for Predicting Insurance Claim
                        
                                Exploratory Data Analysis (EDA) is understanding the data sets by summarizing their main characteristics often plotting them visually. This step is very important especially when we arrive at modeling the data in order to apply Machine learning. In this article, I'll show you how I did for this!
                            
                                2020-11-27
                            
                                    Data Science
                                
                            Python
                        
                            ML
                        
                            EDA
                        
                            Collect Tweets using Twint
                        
                                Twint is a Python-based advanced Twitter scraping app that allows you to scrape Tweets from Twitter profiles without having to use Twitter's API. Twint makes use of Twitter's search operators to allow you to scrape Tweets from specific individuals, scrape Tweets referring to specific themes, hashtags, and trends, and sort out sensitive information like e-mail and phone numbers from Tweets. This is something I find quite handy, and you can get fairly creative with it as well.
                            
                                2020-10-16
                            
                                    Data Science
                                
                            Python
                        
                            Twitter
                        
                            Tweet
                        
                            QS Ranking Crawler
                        
                                This article aims to build a web scraper by using BeautifulSoup and Selenium, and scrape QS Rankings to discover the top universities from all over the world. "Uni name", "ranking" and "location" are fetched from the table and stored as a csv file. Jupyter notebook is available as well through my GitHub.
                            
                                2020-04-15
                            
                                    Data Science
                                
                            Python
                        
                            Crawler
                        
                            Visualisation