Stealing pages from the server...

I train machine to train model.
End-to-End Word2Vec Training End-to-End Word2Vec Training
Word2Vec operates on a rather straightforward concept. We're presuming that a word's meaning may be derived from the company it keeps. A word's meaning is likely to be relatively similar to another word's if its neighbours are quite similar. Using this underlying assumption, you can use Word2Vec to compute similarity between two words and more.
2022-07-09
Self-Attention for NLP Self-Attention for NLP
In short, an attention-based model "focuses" on each element of the input (a word in a sentence or a different position in an image, etc.). "Focusing" means projecting different levels of attention so that the input elements are treated differently and each element of the input is weighted differently to influence the result; a non-attention model treats each element "equally".
2021-09-21
BERT for Text Classification BERT for Text Classification
Deep learning has improved the performance of neural network architectures such as recurrent neural networks (RNN and LSTM) and convolutional neural networks (CNN) in tackling a variety of Natural Language Processing (NLP) problems such as text categorisation, language modelling, machine translation, and so on. Transfer learning is a method of using a deep learning model that has been trained on a big dataset to perform similar tasks on a new dataset. A deep learning model like this is referred to as a pre-trained model. As a result, the demand for NLP transfer learning was at an all-time high. In the paper "Attention is All You Need," published in 2018, Google unveiled the transformer, which proved to be a watershed moment in NLP.
2021-05-30
NLP Meets PyTorch Lightning NLP Meets PyTorch Lightning
PyTorch Lightning is a Python package that provides a high-level interface for PyTorch, a well-known deep learning framework. It's a fast, lightweight framework that organises PyTorch code to separate research and engineering, making deep learning experiments easier to comprehend and reproduce.
2021-05-28
Harry Potter Movies Saga Analysis Harry Potter Movies Saga Analysis
In this article, a sentiment analysis is conducted through the lens of Harry Potter. I am a self-confessed Harry Potter devotee. I've read the books multiple times and watched the films more times than I can count. The lines of each characters in the movies are rich in emotionally charged experiences that the reader can viscerally feel. Can a computer capture that feeling? Let's check it out!
2021-04-28
MeCab and CaboCha for Japanese MeCab and CaboCha for Japanese
In Python, there are several choices of modules for morphological analysis. There are several types of kuromoji such as Janome, Juman, MeCab, and Esanpy, but this time we will use MeCab, which is said to be relatively fast and accurate.
2021-04-27
Sentiment Analysis for Japanese Customer Reviews Sentiment Analysis for Japanese Customer Reviews
The development of elec-tronic business is accelerated by the popularity of the internet. Millions of people buy products and post their reviews online. Public opinion analysis can be used with these reviews. Customers can make better decisions after reading other people's product reviews. There is a pressing need for building the system which can perform the sentiment classification job. In this article, I'll try to build a sentiment anaylsis model for Japanese customer reviews.
2021-04-23
Softmax and Cross-Entropy Softmax and Cross-Entropy
I'm trying to implement neural network from scratch in Python recently. Considering to solve multi-class classification problem using neural network, I try to create a simple neural network. The most important thing in neural network is backpropagation. Backpropagation is an algorithm for supervised learning of artificial neural networks using gradient descent. I want to find the derivation of cross-entropy loss function with softmax activation function, so this article will record the formula I calculated. As for the rest, I will discuss it in the future.
2021-04-18
Viterbi Algorithm for HMM Decoding Viterbi Algorithm for HMM Decoding
Viterbi Algorithm is usually used to find the most likely sequence in HMM. It is now also commonly used in speech recognition, speech synthesis, diarization, keyword spotting, computational linguistics, and bioinformatics. This semester, in the course "Speech Technology", the acoustic signal is treated as the observed sequence of events, and a string of text is considered to be the hidden cause of the acoustic signal in speech recognition task. The Viterbi algorithm finds the most likely string of text given the acoustic signal.
2021-04-17
Dissertation Paraphraser Dissertation Paraphraser
Paraphrasing and summarizing are vital so that you essay doesn't become one long quote of other academics' work. To paraphrase a piece of text is to write it in your own words. In this article, I will show you how I make an app that will help me rephrase the sentence I need.
2021-03-31
Text Representation for Unstructured Data Text Representation for Unstructured Data
Text is a very important unstructured data, and how to represent text data has been an important research direction in the field of machine learning. In this article, I will only discuss the very basic methods, such as Bag of Words, TF-IDF (Term Frequency Inverse Document Frequency), Topic Model, and Word Embedding.
2021-03-22
Train Word Embedding Vectors on Custom Corpus Train Word Embedding Vectors on Custom Corpus
When I was doing my dissertation project, I found out that the performance of model wasn't quite well. I believe it's because the domain of pre-trained GoogleNews-vectors-negative300 is different from the the dataset of mine. Hence, I decide to pre-train a word2vec model by myself.
2021-03-02
Twitter Hate Speech Detection Twitter Hate Speech Detection
The objective of this task is to detect hate speech in tweets. For the sake of simplicity, let's say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets.
2021-02-07
Weighted Word Embedding Weighted Word Embedding
Today I'm going to summarise some important point about weighted word embedding for some specific NLP tasks. Frankly speaking, this is the topic I wish to write about a few months ago, however, I was so busy during my MSc.
2021-01-25
Train Word2Vec Model on WSL Train Word2Vec Model on WSL
In this article, I'm going to build my own pre-trained word embedding on WSL, which stands for Windows Subsystem for Linux, and it is a compatibility layer for running Linux binary executables (in ELF format) natively on Windows 10.. The reason why I train the model on Linux instead of Windows is that it's not user-freiendly to run C++ and some other packages on Windows.
2021-01-22
Sentiment Analysis for KKBOX Sentiment Analysis for KKBOX
This sentiment classification task is based on reviews data of UtaPass and KKBOX from Google Play platform. As a KKStreamer at KKBOX, I become more interested in Natural Language Processing, especially text classification. First, I start crawling the text data using web crawler technique, namely BeautifulSoup and Selenium. Second, I develop several different neural network architectures, including simple RNN, LSTM, GRU, and CNN, to name but a few, to detect the polarity of reviews from customers.
2019-07-10
Categorising Song Genre by Analysing Lyrics Categorising Song Genre by Analysing Lyrics
The ability to classify music in an automated manner has become increasingly more important with the advent of musical streaming services allowing greater access to music. Spotify alone hit 100 million users in 2016, with other services provided by companies such as Apple, Soundcloud and YouTube. In addition, there are huge numbers of professional musicians, approximately 53,000 in the USA alone, as well as amateurs who are producing music which needs to be classified. With this quantity of music, it is unfeasible to classify genres without an automated method.
2019-06-11