Introduction
Twint is a Python-based advanced Twitter scraping app that allows you to scrape Tweets from Twitter profiles without having to use Twitter’s API. Twint makes use of Twitter’s search operators to allow you to scrape Tweets from specific individuals, scrape Tweets referring to specific themes, hashtags, and trends, and sort out sensitive information like e-mail and phone numbers from Tweets. This is something I find quite handy, and you can get fairly creative with it as well.
Installation
You could find it difficult to install Twint
for some reason, therefore I’ll explain you how to do so in the steps below.
- Install
Twint
from its repository.
git clone --depth=1 https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt
- Install a proxy connector package.
pip install --upgrade aiohttp_socks
- In order to solve
OSError: [WinError 87] The parameter is incorrect.
, please openoutput.py
in./src/twint/twint
first. Make the following changes to fixes this issue:
- Add the following to line #9:
import string
- Replace
print(output.replace('\n', ' '))
in line #123 by the following:
word = ''
for i in output:
if i in string.printable:
word = word + i
print(word.replace('\n', ' '))
- Uncomment line 92 (remove the ‘#’) in the
url.py
file for the issue thatc.Until
andc.Since
doesn’t work.
('query_source', 'typed_query'),
Easy Example
Twint now allows custom formatting and can be used as a module. More information can be found over here.
import twint
c = twint.Config()
c.Pandas = True
c.Store_pandas = True
c.Pandas_clean = True
c.Username = 'Bitcoin'
c.Lang = 'en'
c.Since = '2019-08-25'
c.Until = '2020-08-25'
c.Popular_tweets = True
# Run
twint.run.Search(c)
df = twint.storage.panda.Tweets_df
The other way is to scrape the tweets through command line.
twint -u Bitcoin --csv --output tweets.csv --since 2014-01-01
CLI Basic Examples and Combos
More detail about the commands and options are located in the wiki
Conclusion
There are several benefits of using Twint
. First, it can fetch all tweets, and Twitter API limits to last 3200 tweets only. Second, it can be used anonymously and without Twitter Developer sign up. Finally, it can be fast initial setup and no rate limitations. There are a lot more search features to play with within Twint
, you definitely want to play with it by yourself!