Why Cloud?

Cloud Computing, which means storing and accessing the data and programs over the internet instead of your physical hard drives. Which means you have to move your data over low-cost computers provides high availability networks to access your data and work with that. This helps you to reduce the maintaining of physical servers which you have done in on-premise and also with high availability network you can access your data anywhere anytime without any interruption.

Shortcomings Of Traditional Data Warehouses:

  • Cost And Provisioning:
    • To maintain huge data in traditional data warehouse we need to purchase more physical servers and also it is cost-effective to maintain them into a single organizational unit.
  • Scalable:
    • As your data is going to be huge we need to scale accordingly to make it ready for Analysis purpose. However, it will be time-consuming and costly to make your data scalable.
  • Parallel Processing:
    • Processing such a huge amount of data in the traditional data warehouse will be cost-effective as we need to provision many servers to maintain our organizational data. It will be time-consuming and costly to query our historical data
  • Upgrades:
    • Upgrades to the traditional data warehouses should be done manually and tuning and backup should also be done in the same way.

All these shortcomings in the traditional data warehouse can be made strong using Cloud Data warehouses.

Google Cloud Platform

Of all the cloud distribution vendors google cloud distribution provides cheaper and faster computing instances and most of their services run upon these computing instances. You can pay for what you have used and all the services from google cloud run on a global network. 

Predera Customer UseCases:

Here is how Predera helped one of the clients to move from their traditional data analytical warehouses to Google cloud data warehouse.

Challenge:

  • One of our clients has been using traditional data warehouse configured in 3 machines having 8 drive hardware RAID 10 each (which is high-end RAID configuration) enough for their data stream.
  •  Even it is massively parallel processing and scalable data warehouse it seems to be time-consuming on complicated analytical queries and cost-effective too. 
  • Their productivity is less as there was an increase in the number of analytical requests. 
  • This system was not achieving the required levels of performance and becoming painfully expensive for organizational support.

Results:

  • They need to run their analytical queries or stored procedures on the streaming data and also historical data to generate daily report seamlessly. 
  • They require scalable and fast performance Data warehouse which should also be cheaper in storing data and querying complicated procedures.
  • We have explored their requirements and helped out to move their data, procedures to GCP as it is cost-efficient and it only costs for service that we use and also for data egress not for ingress.

Solution:

i) We have moved all the data from their traditional data warehouse to GCP Big Query

 Google Big Query:

  • It is a serverless column store data warehouse for analytics using SQL.
  • It scales internally and can scan TB in seconds and PB in minutes.
  • Pay very less for storing data and pay for GB s data while querying.
  • It also caches the results of a query and if you run the same query again it gets the cached result instead of scanning the query again.
  • We need to pay when we stream data inserts per GB into Big Query.

ii) We helped out to port all their archived data, queries and procedures to GCP Big Query data warehouse.

iii) Most of their procedures and queries are complicated and cost-effective and by utilizing the benefit of big query we made them cost-efficient.

iv) Custom Date columns have been provided for operational teams to run for their own respective dates

v) We made those respective operational teams procedures and analytical queries in an optimized way which helps in generating reports:

  • Partitioning: Using Partition dates helps them to reduce the cost of scanning all those queries in migrated data.
  • SQL Functions: User Defined functions and big query standard functions have been added instead of using those logics in a repetitive way.
  • Views: Respective Views has been added to the data warehouse if those queries to be scanned on stream, scan them using the Scheduling Tool called AIRFLOW every day. 
  • Custom Querying: We added a feature which helps operational teams to seamlessly work on the data by giving their respective custom dates.

Summary

BigQuery is not a solution to everything

Pros 

Collaboration ; ETL time shortened ; Maintenance ; Setup costs 

Cons of moving into BigQuery 

Cost in high; Adapting the queries and optimizing them takes time ;