# Batch Processing
source: https://developer.mastercard.com/open-finance-us/documentation/products/manage/data-enrichment/batch/index.md

This section explains how to use [Data Enrichment](https://developer.mastercard.com/open-finance-us/documentation/products/manage/data-enrichment/index.md) via batch processing. We also have an [API](https://developer.mastercard.com/open-finance-us/documentation/products/manage/data-enrichment/api/index.md) solution.

## How it Works - Batch {#how-it-works---batch}

The batch process allows you to upload a file containing up to 6 million transactions per file. The file will be processed by our enrichment logic and returned to you. Batch files must follow the specifications for column headers, be formatted as CSV, encrypted, and transferred via SFTP.

## SFTP Onboarding {#sftp-onboarding}

To enable the batch process, please work with your Customer Success Manager (CSM) to provide the following:

* The public IP address from which you will access the SFTP server.
* Your public SSH Key (this will allow passwordless authentication to the SFTP server).

<br />

Note: If you do not use your SFTP access, you will lose access after 90 days and you will need contact us again to reactivate.

Your Customer Success Manager will also be able to help with the process of creating the necessary encryption keys, which are not available via the Mastercard Developers project dashboard by default.

## Encryption Key Setup {#encryption-key-setup}

To send data to the Open Finance Batch process, you will need encryption keys to secure the data exchanges with Mastercard.

See the [API Basics](https://developer.mastercard.com/open-finance-us/documentation/onboarding/index.md) section for details of how to create a project and obtain the required project credentials.
Tip: You will need to note partner ID when you have created the project, even if you are only using the batch process and not accessing any APIs.

You will also need encryption keys which you can access from the **Projects** section on Mastercard Developers after logging in (select your project and then look for the project credentials).

You will also need to request production access. Contact your Customer Success Manager (CSM) for further details and to ensure that the option to create the required keys is enabled.

### Encryption Keys {#encryption-keys}

In addition to the credentials used to authenticate your requests, when using the batch file transfer process, you are required to encrypt the data exchanged with Mastercard.

There are two types of encryption keys:

1. **Client Encryption Key**

2. **Mastercard Encryption Key**

Both will be required.

By default you will not see the option to create keys for an Open Finance project on your Mastercard Developers project dashboard. Your Customer Success Manager will enable this functionality for you after you request production access and provide your details.

Once enabled, you will be able to create keys as follows:

![](https://static.developer.mastercard.com/content/open-finance-us/uploads/mcd-add-key.png)

Once created, you can view and download the keys as required:

![](https://static.developer.mastercard.com/content/open-finance-us/uploads/mcd-keys-prod-keys-openfinance.png)

To read more about Encryption Key creation on Mastercard Developers see [Onboarding for Encryption Key Management](https://developer.mastercard.com/open-finance-us/documentation/onboarding-encryption-keys/index.md).

To learn more about how data between client applications and Mastercard is secured, see [Securing Payload Data Using Payload Encryption](https://developer.mastercard.com/platform/documentation/authentication/securing-sensitive-data-using-payload-encryption/) in the [Mastercard Developers Platform](https://developer.mastercard.com/platform/documentation) documentation. To assist you in performing payload encryption and decryption, Mastercard provides [encryption libraries](https://developer.mastercard.com/platform/documentation/authentication/securing-sensitive-data-using-payload-encryption/#client-libraries).

## Encrypting Data and Sending to SFTP {#encrypting-data-and-sending-to-sftp}

There are two aspects to the encryption process:

* You will need to encrypt your CSV data file using an AES-256 ephemeral key.
* Then, you need to encrypt your AES-256 ephemeral key, IV byte array, and encryption tag using JSON Web Encryption (JWE). This should then be packaged as a metadata file in JSON format.

These two files should then be placed in a zip and sent to Mastercard via SFTP.

Follow the steps below to securely send a batch file to Mastercard:
Diagram data-enrichment-batch

1. Prepare the transaction data in a CSV file suitable (see [File Format](https://developer.mastercard.com/open-finance-us/documentation/products/manage/data-enrichment/batch/index.md#file-format) below) for batch processing.

2. Create an initial vector (IV) byte array and an ephemeral AES 256-bit key as a data encryption key. When creating this key, use the AES algorithm in GCM mode (AES/GCM/NoPadding). You can find sample code for creating both an IV byte array and an AES ephemeral key on GitHub [here](https://github.com/Mastercard/client-encryption-java/blob/main/src/main/java/com/mastercard/developer/encryption/aes/AESEncryption.java#L28).

3. Encrypt the CSV file using the ephemeral key and IV byte array from step 2, with the AES encryption algorithm in GCM mode. Name this file `transactions_YYYYMMDDHHMMSS.bin`. The encryption result will also return a `tag` value that will be used in the next step. The following Python example illustrates this:

* Python

```python
  from Crypto.Cipher import AES
  from Crypto.Random import get_random_bytes

  key = get_random_bytes(32) # 32 bytes X 8 bits = 256 bits
  iv = get_random_bytes(16)
  cipher = AES.new(key, AES.MODE_GCM, nonce=iv, use_aesni=True)

  blocksize = 64 * 1024 * 1024 # 64 MB block

  with open("path/to/raw/transactions_YYYYMMDDHHMMSS.csv", 'rb') as reader, \
       open("path/to/encrypted/transactions_YYYYMMDDHHMMSS.bin", 'wb') as writer

    # encrypt large file in 64 MB chunk. This is to avoid loading entire file in memory
    while True:
      block = reader.read(blocksize)
      if not block:
        break
      writer.write(cipher.encrypt(block))

    tag = cipher.digest()
  
```

4. Mastercard will need to know how to decrypt your encrypted CSV file. To do this you need to send the necessary details within a JWE. Encode the `key`, `iv`, and `tag` values as base64 encoded strings (with ASCII codec), and create a JSON metadata payload in the following format:

   ```json
   {
     "key": "5gO+78x1ZjKG3BzwY8AP9dovzUyrBtL/HNgrVIggEIw=",
     "iv": "9PYO6dNmmTuchtbpwT7ldA==",
     "tag": "+vEElaaYP05RMAvR42QEHw=="
   }
   ```

   Encrypt this metadata with JWE Encryption. You can use the [Mastercard Client Encryption Library](https://github.com/Mastercard?&q=client-encryption) to do so. You will need to use your Client Encryption Key tied to your project when setting up the JWE configuration.

* Python

```python
  import json
  from client_encryption.jwe_encryption_config import JweEncryptionConfig
  from client_encryption.jwe_encryption import encrypt_payload, decrypt_payload

  metadata = {
    "key": "5gO+78x1ZjKG3BzwY8AP9dovzUyrBtL/HNgrVIggEIw=",
    "iv": "9PYO6dNmmTuchtbpwT7ldA==",
    "tag": "+vEElaaYP05RMAvR42QEHw=="
  }

  payload = {
    "enc": metadata,
  }

  pub_cert_path = "path/to/client-encryption-key"
  conf = JweEncryptionConfig({
    "paths": {
      "$": {
        "toEncrypt": {
            "enc": "enc",
        },
        "toDecrypt": {}
      }
    },
    "encryptedValueFieldName": "jwe",
    "encryptionCertificate": pub_cert_path
  })

  enc = encrypt_payload(payload=payload, config=conf)
  jwe = enc["enc"]

  with open("path/to/transactions_YYYYMMDDHHMMSS.metadata.json") as metaf:
    metaf.write(json.dumps(jwe))
  
```

5. The JWE created will be used in the next step.

6. Package the returned JWE into a JSON file called `transactions_YYYYMMDDHHMMSS.metadata.json` in the following form:

* Json

```json
  {
    "jwe": "eyJhbGciOiJSU0EtT0FFUC0yNTYiLCJlbmMiOiJBMjU2R0NNIiwia2lkIjoiZTI0YzAyZDU0NGIyYjExNmJhMDZlZDc2NWNjZDY1MzI3OGZkNmIxODQ2MTAxZTU0MjdlNjM2YzkwY2RlZjk0OSIsImN0eSI6ImFwcGxpY2F0aW9uL2pzb24ifQ.YbgsnILXFkVba7mU0PUQpMwsWuVASTAfTIeokghGnCgQbLpjKag75slXCQyroJ27fK0Y_Qgus05yWKYKRaP6zdP1YO5mLIm44iz-EPpz1e3v6DgkZ5MMTxeMqREPkC03cTO9OIajIw0UgbafrmfJ18izJojKerdsnMvnEKZ8_23oqMYN7CHW9Jg5HuEaqxKme0OoITJpzs4Jk-bPsvtxeWam8iNH9l77hWuf2wyTIW_qyx5J9vBXNB6_96uTCtHECttOQkk7-Tk1zYmhKqR4ZG8AOMTIE4YHFmkQoSLxU8YAuMDQFN4Xr5egoo9AlUkN7urrf37gdgUG1ldPVxgXcg.EjRNhXhJUj50-q9DB7XSkg.4fGc5xCuoySNzbMhDYUGRdEA2FfRqlV5yEdrDMlu3sz3tHycLrLJXYBaPyueCUWsi2RJpG85-nY-o689Z2TOQkfu9UZty8nyLfmFfhpUGwhcCaN-OZaXSYOxsLR2UIgbJ8VuFnml1dJ5BfswIaJF2UZXjfbA6yNHig4Jvg.aPVGN0J-JUfqzYhuQ_pkFA"
  }
  
```

7. Package the `transactions_YYYYMMDDHHMMSS.bin` and `transactions_YYYYMMDDHHMMSS.metadata.json` into a zip folder using this naming convention: `transactions_YYYYMMDDHHMMSS.zip`

8. Transfer the zip package to Mastercard via SFTP (in the /inputs directory).

## Decrypt the Resulting Data File {#decrypt-the-resulting-data-file}

Downloading and decrypting the output file is the reverse of the process above for sending files. You will need to download a zip, extract the JWE from the metadata file, and use the ephemeral encryption key details from the JWE to then decrypt the output file itself. There may be two output files, as any failures will be returned in a second file along with error messages.

The process is as follows:
Diagram data-enrichment-batch2

1. Transfer the output file from the Mastercard SFTP to your network. The file will be named with this naming convention: `transactions_YYYYMMDDHHMMSS_output.zip`

2. Unzip the files:

   * `transactions_YYYYMMDDHHMMSS.metadata.json` - This file contains the encryption metadata.
   * `transactions_YYYYMMDDHHMMSS_success.bin` - This is an encrypted file which contains the successfully enriched records.
   * `transactions_YYYYMMDDHHMMSS_fail.bin` - This is an encrypted file which contains those records where data enrichment failed. You can use the error messages to help diagnose what caused the problem.

   The metadata JSON file contains two fields, `jwe` and `fingerprint`:
   * `jwe` contains the encrypted `key` and `iv` used during the encryption of the output files, plus the `successTag` and `failTag` extracted from encrypted output files.
   * `fingerprint` contains the finger print of the key pair used for encrypting the output files.

   For example:
   * Json

   ```json
       {
          "jwe": "eyJhbGciOiJSU0EtT0FFUC0yNTYiLCJlbmMiOiJBMjU2R0NNIiwia2lkIjoiZTI0YzAyZDU0NGIyYjExNmJhMDZlZDc2NWNjZDY1MzI3OGZkNmIxODQ2MTAxZTU0MjdlNjM2YzkwY2RlZjk0OSIsImN0eSI6ImFwcGxpY2F0aW9uL2pzb24ifQ.YbgsnILXFkVba7mU0PUQpMwsWuVASTAfTIeokghGnCgQbLpjKag75slXCQyroJ27fK0Y_Qgus05yWKYKRaP6zdP1YO5mLIm44iz-EPpz1e3v6DgkZ5MMTxeMqREPkC03cTO9OIajIw0UgbafrmfJ18izJojKerdsnMvnEKZ8_23oqMYN7CHW9Jg5HuEaqxKme0OoITJpzs4Jk-bPsvtxeWam8iNH9l77hWuf2wyTIW_qyx5J9vBXNB6_96uTCtHECttOQkk7-Tk1zYmhKqR4ZG8AOMTIE4YHFmkQoSLxU8YAuMDQFN4Xr5egoo9AlUkN7urrf37gdgUG1ldPVxgXcg.EjRNhXhJUj50-q9DB7XSkg.4fGc5xCuoySNzbMhDYUGRdEA2FfRqlV5yEdrDMlu3sz3tHycLrLJXYBaPyueCUWsi2RJpG85-nY-o689Z2TOQkfu9UZty8nyLfmFfhpUGwhcCaN-OZaXSYOxsLR2UIgbJ8VuFnml1dJ5BfswIaJF2UZXjfbA6yNHig4Jvg.aPVGN0J-JUfqzYhuQ_pkFA",
          "fingerprint": "cfb9bf8efec58af5f5231d73d7785f2eb3991c3dae8b1539a3cde9fdb6cd6dab"
       }
       
   ```

3. Extract the JWE from the metadata and decrypt it to obtain the ephemeral key that was used by Mastercard to encrypt the processed file. You can use the Mastercard Client Encryption Library to do so with JWE decryption. Use the Mastercard Decryption Key associated with `fingerprint` to decrypt the `jwe` value.

4. Decrypting the JWE should produce a JSON file containing `key`, `iv`, `successTag` and `failTag`. These will be base64 encoded strings (with ASCII codec). Decode them to bytes. You will then be able to use these to decrypt the data files.

5. With AES in GCM mode and the `key` and `iv` from the previous step, decrypt the `.bin` files:

   * To decrypt the success file, use `transactions_YYYYMMDDHHMMSS_success.bin` as the payload and the `success_tag` as the tag.
   * To decrypt the fail file, use `transactions_YYYYMMDDHHMMSS_fail.bin` as the payload and the `fail_tag` as the tag.

The following Python code example shows the decryption of the JWE and then the data files.
* Python

```python
import base64

enc_metadata = {
  "jwe": "eyJhbGciOiJSU0EtT0FFUC0yNTYiLCJlbmMiOiJBMjU2R0NNIiwia2lkIjoiZTI0YzAyZDU0NGIyYjExNmJhMDZlZDc2NWNjZDY1MzI3OGZkNmIxODQ2MTAxZTU0MjdlNjM2YzkwY2RlZjk0OSIsImN0eSI6ImFwcGxpY2F0aW9uL2pzb24ifQ.YbgsnILXFkVba7mU0PUQpMwsWuVASTAfTIeokghGnCgQbLpjKag75slXCQyroJ27fK0Y_Qgus05yWKYKRaP6zdP1YO5mLIm44iz-EPpz1e3v6DgkZ5MMTxeMqREPkC03cTO9OIajIw0UgbafrmfJ18izJojKerdsnMvnEKZ8_23oqMYN7CHW9Jg5HuEaqxKme0OoITJpzs4Jk-bPsvtxeWam8iNH9l77hWuf2wyTIW_qyx5J9vBXNB6_96uTCtHECttOQkk7-Tk1zYmhKqR4ZG8AOMTIE4YHFmkQoSLxU8YAuMDQFN4Xr5egoo9AlUkN7urrf37gdgUG1ldPVxgXcg.EjRNhXhJUj50-q9DB7XSkg.4fGc5xCuoySNzbMhDYUGRdEA2FfRqlV5yEdrDMlu3sz3tHycLrLJXYBaPyueCUWsi2RJpG85-nY-o689Z2TOQkfu9UZty8nyLfmFfhpUGwhcCaN-OZaXSYOxsLR2UIgbJ8VuFnml1dJ5BfswIaJF2UZXjfbA6yNHig4Jvg.aPVGN0J-JUfqzYhuQ_pkFA",
  "fingerprint": "cfb9bf8efec58af5f5231d73d7785f2eb3991c3dae8b1539a3cde9fdb6cd6dab"
}

jwe = enc_metadata["jwe"]
fingerprint = enc_metadata["fingerprint"]

# get the private key (Mastercard Encryption Key) associated with fingerprint
private_key_file = "path/to/certs/keyname-mastercard-encryption-key.p12"

payload = {
  "enc": jwe,
}

cfg = {
  "paths": {
    "$": {
      "toEncrypt": {},
      "toDecrypt": {
        "enc.jwe": "dec",
      }
    }
  },
  "encryptedValueFieldName":"jwe",
  "decryptionKey": private_key_file,
  "decryptionKeyPassword": "<your-passphrase>"
}

conf = JweEncryptionConfig(cfg)
metadata = decrypt_payload(payload=payload, config=conf)

key = base64.b64decode(metadata["key"])
iv = base64.b64decode(metadata["iv"])
tagSuccess = base64.b64decode(metadata["tagSuccess"])
tagFail = base64.b64decode(metadata["tagFail"])


def decrypt_file(key: bytes, nonce: bytes, enc_file: str, dec_file: str, tag: bytes):
    blocksize = 64 * 1024 * 1024 # 64 MB block

     with open(enc_file, 'rb') as reader, open(dec_file, 'wb') as writer:
        cipher = AES.new(key, AES.MODE_GCM, nonce=nonce, use_aesni=True)

        # decrypt large file in 64 MB chunk. This is to avoid loading entire file in memory
        while True:
            block = reader.read(batch_size)
            if not block:
                break
            writer.write(cipher.decrypt(block))
        cipher.verify(tag)


enc_file = f"path/to/transactions_YYYYMMDDHHMMSS_success.bin"
dec_file = f"path/to/transactions_YYYYMMDDHHMMSS_success.csv"
decrypt_file(key, iv, enc_file, dec_file, tagSuccess)

enc_file = f"path/to/transactions_YYYYMMDDHHMMSS_fail.bin"
dec_file = f"path/to/transactions_YYYYMMDDHHMMSS_fail.csv"
decrypt_file(key, iv, enc_file, dec_file, tagFail)
```

## File Format {#file-format}

The CSV file you send to Mastercard must contain externalCustomerId, accountType, description, and amount fields for each row. For best results and to improve your ability to map our results back to your data, ensure you provide each of the following fields listed below.
Warning: The externalCustomerId and externalAccountId fields are to allow you to map the transactions back to your data. **Do not send Mastercard plaintext representations of customer or account IDs**. When used, the representative IDs must be obfuscated through cryptographically strong hashing (we recommend using SHA-2 or SHA-3 methods). You can also use your own internal hashing mapping.

|      Column Name      |                                                                                                                                                                                   Description                                                                                                                                                                                   |                                                                   Data type                                                                    |
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
| externalCustomerId    | Identifier for you to use to match the transactions back on your end. Mastercard will simply respond with the exact same ID. This should not be a real customer ID.                                                                                                                                                                                                             | String, up to 100 characters.                                                                                                                  |
| externalAccountId     | Identifier for you to use to match the transactions back on your end. Mastercard will simply respond with the exact same ID. This should not be a real account ID.                                                                                                                                                                                                              | String, up to 100 characters.                                                                                                                  |
| accountType           | A string that denotes the account type, i.e., savings, checking, etc. If "Unknown" is provided, Data Enrichment will assume it is a checking account. An incorrect assumption could impact the results from Data Enrichment.                                                                                                                                                    | String, one of these possible types: "checking", "savings", "creditCard", "brokerageAccount", "investment", "healthSavingsAccount", "unknown". |
| postedTimestamp       | The date and time the transaction posted.                                                                                                                                                                                                                                                                                                                                       | String, up to 32 characters. Should be in ISO8601 format.                                                                                      |
| transactionTimestamp  | The date and time the transaction occurred.                                                                                                                                                                                                                                                                                                                                     | String, up to 32 characters. Should be in ISO8601 format.                                                                                      |
| description           | Description of the transaction.                                                                                                                                                                                                                                                                                                                                                 | String, up to 1024 characters.                                                                                                                 |
| memo                  | Memo of the transaction.                                                                                                                                                                                                                                                                                                                                                        | String, up to 511 characters.                                                                                                                  |
| amount                | Value amount for transaction.                                                                                                                                                                                                                                                                                                                                                   | Number (double).                                                                                                                               |
| directionIndicator    | The directionIndicator should be from the perspective of the account holder. If you always send us positive amount values, you MUST send us corresponding directionIndicator values to ensure the categorization logic works as intended. If you have internal logic to provide the amount field as either positive or negative, do not send us data in the directionIndicator. | String, either "Debit" or "Credit".                                                                                                            |
| transactionFee        | Transaction Fee for transaction.                                                                                                                                                                                                                                                                                                                                                | Number (double).                                                                                                                               |
| type                  | Type for transaction, such as debit or credit.                                                                                                                                                                                                                                                                                                                                  | String, up to 32 characters.                                                                                                                   |
| externalTransactionId | Transaction Id to link the transaction back to your system. While not required, this is **strongly** recommended. Mastercard does not guarantee the order of transactions in the input file will match the order of transactions in the output file.                                                                                                                            | String, up to 100 characters.                                                                                                                  |
| additionalDetails     | Each transaction can optionally include up to 30 sets of additional details as key-value pairs. For each additional detail you need to provide two consecutive columns, one with the key string and the following column with the value string.                                                                                                                                 | String, up to 100 characters for the key and 255 characters for the value.                                                                     |

## SFTP folder structure {#sftp-folder-structure}

Files sent for batch processing should be placed in your /input folder. The result file will appear in your /output folder once it has been fully processed (i.e., it is safe to download the file as soon as it appears in the /output folder).

As an approximate rule, you should allow for 10 minutes per million rows of transactions, plus 15 minutes, before checking the output folder:

* 1,000,000 rows --- allow at least 25 minutes
* 2,000,000 rows --- allow at least 35 minutes
* 3,000,000 rows --- allow at least 45 minutes
* 4,000,000 rows --- allow at least 55 minutes
* 5,000,000 rows --- allow at least 65 minutes
* 6,000,000 rows --- allow at least 75 minutes

### File naming conventions {#file-naming-conventions}

You should name your zip file and contents using the following format:

* Zip Name: `transactions_YYYYMMDDHHMMSS.zip`

  Zip Contents:
  * Encrypted file: `transactions_YYYYMMDDHHMMSS.bin`
  * Metadata file: `transactions_YYYYMMDDHHMMSS.metadata.json`

The zip file returned will be in the following form:

* Zip Name: `transactions_YYYYMMDDHHMMSS_output.zip`

  Zip Contents:
  * Encrypted Result files:
    * `transactions_YYYYMMDDHHMMSS_success.bin`
    * `transactions_YYYYMMDDHHMMSS_fail.bin`
  * Metadata file: `transactions_YYYYMMDDHHMMSS.metadata.json`

Structure of metadata.json:

```json
{
  "jwe": "generated JWE",
  "iv": "base64 encoded string of iv bytes",
  "tag": "base64 encoded string of tag bytes"
}
```

## Example File Input and Output {#example-file-input-and-output}

We provide example CSV files for input (before data enrichment) and output (after data enrichment) below.

* Example input file:
  [data-enrichment-batch-input.csv](https://static.developer.mastercard.com/content/open-finance-us/uploads/batch/data-enrichment-batch-input.csv) (5KB)

* Example output file:
  [data-enrichment-batch-output.csv](https://static.developer.mastercard.com/content/open-finance-us/uploads/batch/data-enrichment-batch-output.csv) (12KB)

