Let's suppose that we have a Google Analytics query like this, which however is subject to sampling by Google Analytics because the data are just too large.
SELECT visitors, newVisits, percentNewVisits, visitorType, visitCount, daysSinceLastVisit, userDefinedValue ,parseDate("date",'yyyyMMdd') as tdateFROM table( exec GoogleAnalytics1.get( profile=>'12345678', startDate=>'2013-08-01', endDate=>curdate(), metrics=>'visitors,newVisits,percentNewVisits', dimensions=>'date,visitorType,visitCount,daysSinceLastVisit,userDefinedValue' )) a
The solution is to get the data batchwise into a data warehouse table.
Step 1: execute this code. It gets the initial part of the data, for example one month. After this step, please check that the table dwh.google_history contains at least one data row.
SELECT visitors, newVisits, percentNewVisits, visitorType, visitCount, daysSinceLastVisit, userDefinedValue ,parseDate("date",'yyyyMMdd') as tdateINTO dwh.google_historyFROM table( exec GoogleAnalytics1.get( profile=>'12345678', startDate=>'2013-08-01', endDate=>'2013-09-01', metrics=>'visitors,newVisits,percentNewVisits', dimensions=>'date,visitorType,visitCount,daysSinceLastVisit,userDefinedValue' ) ) a
Step 2: create this view. Here, we set the size of the batch equal to 1 month.
SELECT visitors, newVisits, percentNewVisits, visitorType, visitCount, daysSinceLastVisit, userDefinedValue ,parseDate("date",'yyyyMMdd') as tdateFROM table( exec GoogleAnalytics1.get( profile=>'12345678', startDate=>( select max(tdate) as startdate from dwh.google_history ) , endDate=>( select timestampadd(SQL_TSI_MONTH, 1,max(tdate)) as enddate from dwh.google_history ), metrics=>'visitors,newVisits,percentNewVisits', dimensions=>'date,visitorType,visitCount,daysSinceLastVisit,userDefinedValue' ) ) a
Step 3: create a batch job on this view (that is, on the recommendation corresponding to this view). The target should be the same as the INTO table in the first step. Optionally, it makes sense to set identity to date or year and month of the date to cleanly update the border value between batches.