Data Science

Publish Date: 2020-10-16

Word Count: 420

Read Times: 2 Min

Read Count:

Introduction

Twint is a Python-based advanced Twitter scraping app that allows you to scrape Tweets from Twitter profiles without having to use Twitter’s API. Twint makes use of Twitter’s search operators to allow you to scrape Tweets from specific individuals, scrape Tweets referring to specific themes, hashtags, and trends, and sort out sensitive information like e-mail and phone numbers from Tweets. This is something I find quite handy, and you can get fairly creative with it as well.

Installation

You could find it difficult to install Twint for some reason, therefore I’ll explain you how to do so in the steps below.

Install Twint from its repository.

git clone --depth=1 https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt

Install a proxy connector package.

pip install --upgrade aiohttp_socks

In order to solve OSError: [WinError 87] The parameter is incorrect., please open output.py in ./src/twint/twint first. Make the following changes to fixes this issue:

Add the following to line #9:

import string

Replace print(output.replace('\n', ' ')) in line #123 by the following:

word = ''
for i in output:
	if i in string.printable:
		word = word + i
print(word.replace('\n', ' '))

Uncomment line 92 (remove the ‘#’) in the url.py file for the issue that c.Until and c.Since doesn’t work.

('query_source', 'typed_query'),

Easy Example

Twint now allows custom formatting and can be used as a module. More information can be found over here.

import twint

c = twint.Config()
c.Pandas = True
c.Store_pandas = True
c.Pandas_clean = True
c.Username = 'Bitcoin'
c.Lang = 'en'
c.Since = '2019-08-25'
c.Until = '2020-08-25'
c.Popular_tweets = True

# Run
twint.run.Search(c)
df = twint.storage.panda.Tweets_df

The other way is to scrape the tweets through command line.

twint -u Bitcoin --csv --output tweets.csv --since 2014-01-01

CLI Basic Examples and Combos

More detail about the commands and options are located in the wiki

Conclusion

There are several benefits of using Twint. First, it can fetch all tweets, and Twitter API limits to last 3200 tweets only. Second, it can be used anonymously and without Twitter Developer sign up. Finally, it can be fast initial setup and no rate limitations. There are a lot more search features to play with within Twint, you definitely want to play with it by yourself!

References

Yang Wang

https://penguinwang96825.github.io/Yang-Tech-Blog/Yang-Tech-Blog/2020/10/16/2020-10-16-collect-tweets-using-twint/

All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Yang Wang !

Python Twitter Tweet

EDA for Predicting Insurance Claim

Exploratory Data Analysis (EDA) is understanding the data sets by summarizing their main characteristics often plotting them visually. This step is very important especially when we arrive at modeling the data in order to apply Machine learning. In this article, I'll show you how I did for this!

2020-11-27 Data Science

Python ML EDA

QS Ranking Crawler

This article aims to build a web scraper by using BeautifulSoup and Selenium, and scrape QS Rankings to discover the top universities from all over the world. "Uni name", "ranking" and "location" are fetched from the table and stored as a csv file. Jupyter notebook is available as well through my GitHub.

2020-04-15 Data Science

Python Crawler Visualisation

Collect Tweets using Twint

Introduction

Installation

Easy Example

CLI Basic Examples and Combos

Conclusion

References