Solving data loss in pipelines

About project:

Client overview

The client, a financial services firm, relied heavily on precise data handling to support accurate reporting and decision-making processes. Given the critical nature of their financial data, they required a robust ETL solution to ensure high precision and integrity throughout data processing workflows, particularly for complex numerical calculations.

Tech Stack:

Python, Pandas, Raw SQL

Tech stack after migration:

Python, SQLAlchemy, Pandas, PostgreSQL

Time to deliver project:

2-4 Weeks

Problem

  • The client occasionally encountered a loss of precision in NUMBER type values after running ETL processes.

Inspection

  • We found that when using Pandas in dataframes, the data types were automatically inferred, which sometimes resulted in a loss of precision and data distortion. This issue was particularly problematic when dealing with financial data where accuracy is critical.

Recommendation

  • It is recommended to use declarative type assignments in ETL processes to prevent accuracy loss. SQLAlchemy ORM is an effective tool for this, offering additional benefits like precise control over database transactions.

Resolution

We implemented the use of SQLAlchemy ORM to define column types explicitly, ensuring that precision was maintained throughout the ETL process. This solution also provided more granular control over transactions and the COMMIT process.

Similar projects

Do you want
the same one?

Leave a request and our manager will contact you to discuss your project and give an assessment of a similar project.

Please enter your name

Please enter your email

Please enter valid email

Please enter valid phone number

Our website use cookies
Read our Privacy Policy.
Order an audit

Please enter your name

Please enter your email

Please enter valid email

Please enter valid phone number

Order Black box audit

Please enter your name

Please enter your email

Please enter valid email

Please enter valid phone number

Order White box audit

Please enter your name

Please enter your email

Please enter valid email

Please enter valid phone number