What is Cloud and Why?
What made us think about cloud? Is it a new trending technology? or Are we just following the same as other companies do. What factors influenced companies to migrate from on-premise to cloud. As we are familiar with the word cloud computing which means storing and accessing the data and programs over the internet instead of your physical hard drives. Which means you have to move your data over low cost computers provides high availability networks to access your data and work with that. This helps you to reduce the maintaining of physical servers which you have done in on-premise and also with high availability network you can access your data anywhere anytime without any interruptibility.
Shortcomings Of Traditional Data Warehouses:
Cost And Provisioning:
To maintain huge data in traditional data warehouse we need to purchase more physical servers and also it is cost effective to maintain them into single organizational unit.
As your data is going to be huge we need to scale accordingly to make it ready for Analysis purpose. However it will be time consuming and costly to make your data scalable.
Processing such a huge amount of data in Traditional data warehouse will be cost effective as we need to provision many servers to maintain our organizational data. It will be time consuming and costly to query our historical data
Upgrades to the traditional data warehouses should be done manually and tuning and backup should also be done in the same way.
All these shortcomings in traditional data warehouse can be made strong using Cloud Data warehouses.
Google Cloud Platform:
Of all cloud distribution vendors google cloud distribution provides cheaper and faster computing instances and most of their services runs upon these computing instances. You can pay for what you have used and all the services from google cloud runs on global network.
Predera Customer UseCases:
Here is how predera helped one of the clients to move from their traditional data analytical warehouses to Google cloud data warehouse.
One of our clients has been using traditional data warehouse configured in 3 machines having 8 drive hardware RAID 10 each (which is high end RAID configuration) enough for their data stream.
Even it is massively parallel processing and scalable data warehouse it seems to be time consuming on complicated analytical queries and cost effective too.
Their productivity is less as there were increase in number of analytical requests. This system was not achieving the required levels of performance and becoming painfully expensive for the organizational support.
They need to run their analytical queries or stored procedures on the streaming data and also historical data to generate daily report seamlessly.
They require scalable and fast performance Data warehouse which should also be cheaper in storing data and querying complicated procedures.
We have explored their requirements and helped out to move their data, procedures to GCP as it is cost efficient and it only costs for service that we use and also for data egress not for ingress.
i) We have moved all the data from their traditional data warehouse to GCP Big Query
Google Big Query:
– It is serverless column store data warehouse for analytics using SQL.
– It scales internally and can scan TB in seconds and PB in minutes.
– Pay very less for storing data and pay for GB s data while querying.
– It also caches the results of a query and if you run the same query again it gets the cached result instead of scanning the query again.
– We need to pay when we stream data inserts per GB into Big Query.
ii) We helped out to port all their archived data, queries and procedures to GCP Big Query data warehouse.
iii) Most of their procedures and queries are complicated and cost effective and by utilizing the benefit of big query we made them cost efficient.
iv) Custom Date columns has been provided for operational teams to run for their own respective dates
v) We made those respective operational teams procedures and analytical queries in an optimized way which helps in generating reports:
Partitioning: Using Partition dates helps them to reduce the cost for scanning all those queries in migrated data.
SQL Functions: User Defined functions and big query standard functions have been added instead of using those logics in repetitive way.
Views: Respective Views has been added to the data warehouse if those queries to be scanned on stream, scan them using Scheduling Tool called AIRFLOW every day.
Custom Querying: We added a feature which helps operational teams to seamlessly work on the data by giving their respective custom dates.
Learnings from this exercise
BigQuery is not the solution for everything.
– Collaboration ;
– ETL time shortened ;
– Maintenance ;
– Setup costs
Cons of moving into BigQuery
– High Cost ;
– Adapting the queries and optimizing them takes time ;