public

Daily Backups from Salesforce to Amazon S3 using Mule 4

One of the most important assets right now is the information. Companies are paying a lot of money to keep their information safe and retained in some place they can

8 months ago

Latest Post What tools do I use as Mulesoft developer by Edgar Moran public

One of the most important assets right now is the information. Companies are paying a lot of money to keep their information safe and retained in some place they can access it easily in case of analysis or disaster recovery.

In this post I will demonstrate how to create a simple application using Mule 4 that retrieves dynamically all custom objects with all fields from a Salesforce instance and creates a daily file into Amazon S3.

This version of this tutorial only covers custom objects from Salesforce and creates simple CSV files with all data. Eventually there's more potential to make a more sophisticated solution selecting dynamically the objects and fields and the backups compressed. For now the easy version I guess would be worth.

Let's start with the Salesforce piece. First we need to have a Salesforce instance, a user name, password and token generated. This will allow to have the connection to Salesforce. Using the Salesforce connector we will use the next operations:


Describe Global

This operation basically returns a list of all objects inside the Salesforce instance. Is important to mention that even we get information from everything, in this case we don't need everything except custom created ones. For this we'll have a flow like this:

After calling the operation we will use a DataWeave component and we just need to filter using the custom == true attribute

%dw 2.0
output application/java
---
payload.sobjects filter $.custom == true

Describe sObject

Once we have the custom objects collected, then we need basically to loop over each one of them and describe them, this means we will get all fields and attributes as they exist in Salesforce. Personally I think this is a very powerful call just because we can do many things with this information, just like use it to create metadata formats and generate custom features out of salesforce to graphically show the relationships, or even better build database schema definitions on the fly.

So once we have all fields we need to make an additional filter, because during the process we need to use BULK API features, there're some restrictions using Location and Address fields, so will need to take them out of the picture, so then we need to filter them like this:

%dw 2.0
output application/java
---
payload.fields filter ((value,index) -> (value.fiedType != "LOCATION" and value.fieldType != "ADDRESS"))

Next, we need to capture all the fields in one single string separated by comma in order to crate the query we will use on the Query Job.

%dw 2.0
output application/java
---

(payload map {
	fieldName: $.name	
}.fieldName  default [] ) joinBy ","
%dw 2.0
output application/java
---
("Select " ++ payload ++ " from " ++ vars.sobjectName) default ""

Until this steps we have have a query for a single sObject so then we can start creating the jobs in Salesforce.

Create the Salesforce Job.

The job creation is the next steps that will allow to have a small container that contains the operation we want to execute, in this case the operation is query. So first we need to create a payload specifying the job request, it looks like this.

%dw 2.0
output application/java
---
{
	concurrencyMode:"Parallel",
	contentType:"CSV"
}

Then basically we need to setup the connector with this configuration:

In general we need to keep track of the job that has been generated as we need to create a batch for query after the job has been created. So save the job Id and pull the full Job information using the Job Info operation

Finally, let's use the Create Batch for query operation to inject the query into the job, so now Salesforce what information to extract.

Getting the batch info List.

This operation based on the Job Id, allows to pull the list of files that has been generated in Salesforce based in our query. We need to keep track of the files that are in progress and the ones already finished. Once all files are completed meaning the data is ready to be downloaded then we call Batch Result stream

Batch result stream basically returns the file id we need to pull we use a DataWeave like this to capture the result:

%dw 2.0
ns ns0 http://www.force.com/2009/06/asyncapi/dataload
output application/java
---
payload.ns0#"result-list".ns0#result

Then we use Query result stream operation like this to download the stream object:

At this point we are ready to push the information into Amazon.

Creating the object in Amazon S3.

Once we have the file we can define the folder structure we want to create the object in S3. In this case  there are couple things to have before push the file.

  1. To have  access key and secret to authenticate agains amazon s3.
  2. To have an already created bucket in Amazon, in this case I have one called emoran-backup

Now the key we will use to create the objet using the Amazon S3 connector is

now() as String {format: "yyyy-MM-dd"} ++ "/" ++ vars.sObjectName as String ++ ".csv"

This way we can keep a folder per day and all the files inside of it.

S3 page will looks like this:

You can download the project from this link and use it using your own credentials Git repository

Edgar Moran

Published 8 months ago

Comments?

Leave us your opinion.