This paper seeks to understand the effects of climate shocks on firms and labor markets in Brazil.
Under the guidance of Daniela Scur, I worked to create a data pipeline from scratch that did the following:
- Web-scrape differently-structured ~5500 Brazilian municipality websites written in Portuguese, and download all the PDFs available (approximately 25 million PDFs).
- The pipeline not only adapted to different website structures, but also evaded CAPTCHA issues using various techniques.
- Using PDF plumber, I converted information from these differently structured PDFs into usefully structured Excel files.
- Additionally, I did some geospatial data visualization in Python & Tableau to create interactive graphics for this project.