Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

.pro-gallery-wix-wrapper {display: block !important;} .pro-gallery-wix-wrapper .gallery-item-container {opacity: 1 !important; display: block !important;}

Automação

Project type

Automation - bot

Date

April 2025

Location

Brazil

Dataset

Public domain data

Link 🔗

GitHub

Purpose

The script allows you to read and extract any information from PDF files. It processes the documents, extracts the specified data and organizes it in an Excel spreadsheet and/or inserts it into a MySQL database.

How it works

1. Reading the PDFs: The script accesses a directory containing the PDF files and goes through each document.
2. Data extraction: Uses regular expressions (Regex) to identify patterns in the extracted text, allowing any desired information to be captured.
3. Storage: The extracted data is entered into a MySQL database and saved in an Excel file for future analysis.
4. Error handling: Any failure in the extraction is logged and the document is marked as “Error”.

Solved Problems

- Automates the extraction of information without the need for manual reading.
- Improves efficiency when processing large volumes of documents.
- Facilitates data organization and analysis.

Example of use

Scenario: A company receives hundreds of PDF invoices and needs to extract the invoice number and date for accounting purposes.
1. The script is configured to recognize patterns such as “INVOICE #” and “DATE:”.
2. When you run the code, it processes all the PDFs in the directory and extracts the information.
3. The extracted data is stored in the database and exported to Excel.