Categories
Programming Taiwan

A personal finance data pipeline project

I had received a (family) project brief recently. In Taiwan many credit/debit cards have various promotions and deal, and many of them depend on one’s monthly spending, for example “below X NTD spending each month, get Y% cashback”. People also have a lot of different cards, so playing these off each other can be nice pocket change, but have to keep an eye on whether where one is compared to the max limit (X). So the project comes from here: easy/easier tracking of where one specific card’s spending is within the monthly period. That doesn’t sound too difficult, right? Except the options for these are:

  1. A banking website with CAPTCHAs and no programmatic access
  2. An email received each day with an password-protected PDF containing the last day’s transactions in a table

Neither of these are fully appetizing to tackle, but both are similar to bits that I do at #dayjob, but 2. was a bit closer to what I’ve been doing recently, so that’s where I landed. That is:

  • Forward the received email (the email provider does it)
  • Receive it in some compute environment
  • Decrypt the PDF
  • Extract the transaction data table
  • Clean and process the tabular data
  • Put raw in some data warehouse
  • Transform data to get the right aggregation
  • Literally profit?

I was surprised how quick this actually worked out in the end (if “half a weekend” is quick), and indeed this can be a first piece of a “personal finance data warehouse”.