Stealing pages from the server...

Collect Tweets using Twint


Introduction

Twint is a Python-based advanced Twitter scraping app that allows you to scrape Tweets from Twitter profiles without having to use Twitter’s API. Twint makes use of Twitter’s search operators to allow you to scrape Tweets from specific individuals, scrape Tweets referring to specific themes, hashtags, and trends, and sort out sensitive information like e-mail and phone numbers from Tweets. This is something I find quite handy, and you can get fairly creative with it as well.

Installation

You could find it difficult to install Twint for some reason, therefore I’ll explain you how to do so in the steps below.

  1. Install Twint from its repository.
git clone --depth=1 https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt
  1. Install a proxy connector package.
pip install --upgrade aiohttp_socks
  1. In order to solve OSError: [WinError 87] The parameter is incorrect., please open output.py in ./src/twint/twint first. Make the following changes to fixes this issue:
  • Add the following to line #9:
import string
  • Replace print(output.replace('\n', ' ')) in line #123 by the following:
word = ''
for i in output:
	if i in string.printable:
		word = word + i
print(word.replace('\n', ' '))
  1. Uncomment line 92 (remove the ‘#’) in the url.py file for the issue that c.Until and c.Since doesn’t work.
('query_source', 'typed_query'), 

Easy Example

Twint now allows custom formatting and can be used as a module. More information can be found over here.

import twint

c = twint.Config()
c.Pandas = True
c.Store_pandas = True
c.Pandas_clean = True
c.Username = 'Bitcoin'
c.Lang = 'en'
c.Since = '2019-08-25'
c.Until = '2020-08-25'
c.Popular_tweets = True

# Run
twint.run.Search(c)
df = twint.storage.panda.Tweets_df

The other way is to scrape the tweets through command line.

twint -u Bitcoin --csv --output tweets.csv --since 2014-01-01 

CLI Basic Examples and Combos

More detail about the commands and options are located in the wiki

Conclusion

There are several benefits of using Twint. First, it can fetch all tweets, and Twitter API limits to last 3200 tweets only. Second, it can be used anonymously and without Twitter Developer sign up. Finally, it can be fast initial setup and no rate limitations. There are a lot more search features to play with within Twint, you definitely want to play with it by yourself!

References

  1. https://medium.com/analytics-vidhya/how-to-scrape-tweets-from-twitter-with-python-twint-83b4c70c5536
  2. https://github.com/twintproject/twint

Author: Yang Wang
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Yang Wang !
 Previous
EDA for Predicting Insurance Claim EDA for Predicting Insurance Claim
Exploratory Data Analysis (EDA) is understanding the data sets by summarizing their main characteristics often plotting them visually. This step is very important especially when we arrive at modeling the data in order to apply Machine learning. In this article, I'll show you how I did for this!
2020-11-27
Next 
QS Ranking Crawler QS Ranking Crawler
This article aims to build a web scraper by using BeautifulSoup and Selenium, and scrape QS Rankings to discover the top universities from all over the world. "Uni name", "ranking" and "location" are fetched from the table and stored as a csv file. Jupyter notebook is available as well through my GitHub.
2020-04-15
  TOC