{"id":69496,"date":"2022-06-21T09:15:18","date_gmt":"2022-06-21T09:15:18","guid":{"rendered":"https:\/\/www.globallogic.com\/?post_type=insightsection&#038;p=69496"},"modified":"2025-01-27T10:14:45","modified_gmt":"2025-01-27T10:14:45","slug":"evolution-of-data-analytics-technologies-part-2","status":"publish","type":"insightsection","link":"https:\/\/www.globallogic.com\/insights\/blogs\/evolution-of-data-analytics-technologies-part-2\/","title":{"rendered":"Evolution of Data &amp; Analytics Technologies (Part -2)"},"content":{"rendered":"<div class=\"classic_editor_content\"><span style=\"font-weight: 400\">In<\/span><a href=\"https:\/\/www.globallogic.com\/insights\/blogs\/the-evolution-of-data-analytics-technologies\/\"> <span style=\"font-weight: 400\">part 1<\/span><\/a><span style=\"font-weight: 400\"> of this blog series, we looked at the data and analytics evolution across data platforms, data processing technologies, and data architecture. Here in part 2, we\u2019ll take a look at the evolution of the data and analytics space across application development and storage aspects.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Data Application Development Evolution<\/span><\/h3>\n<table>\n<tbody>\n<tr>\n<td><b>Programming based \u2192 Scripting \u2192 SQL like \u2192 Low\/No Code UI<\/b><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400\">Initially, data engineers used programming languages like Java to develop most of the data applications on initial big data ecosystem projects like Apache Hadoop. This was because these frameworks provided interfaces to create and deploy data applications using the Java or Scala programming language.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Soon after, data engineers and analysts could easily use custom scripting languages like Apache Pig for Hadoop or Scalding for Cascading to develop jobs in a more user-friendly way without writing programs in the underlying language.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Due to the widespread use of SQL amongst the data analyst and data scientist communities, SQL and SQL-like frameworks such as Apache Hive for Hadoop, CQL for Cassandra, and Apache Phoenix for HBase became prominent and continue to be widely used by data engineers and data analysts alike.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Currently, with a shortage of data engineers and analysts, enterprises are increasingly looking at user interface based development that can reduce the implementation complexity and improve productivity. Therefore, the trend for the future is to move towards low code or no-code user interface based applications like AWS Glue, Azure Data Factory, Prophecy.ai, and<\/span><a href=\"https:\/\/www.globallogic.com\/services\/offerings\/digital-accelerators\/dpa\/\"><span style=\"font-weight: 400\"> GlobalLogic Data Platform<\/span><\/a><span style=\"font-weight: 400\"> that minimizes the learning curve for data engineers and accelerates the development for enterprises.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-69497\" src=\"https:\/\/www.globallogic.com\/wp-content\/uploads\/2022\/06\/Evolution-of-Data-On-Page.jpg\" alt=\"\" width=\"1024\" height=\"498\" srcset=\"https:\/\/www.globallogic.com\/wp-content\/uploads\/2022\/06\/Evolution-of-Data-On-Page.jpg 1024w, https:\/\/www.globallogic.com\/wp-content\/uploads\/2022\/06\/Evolution-of-Data-On-Page-300x146.jpg 300w, https:\/\/www.globallogic.com\/wp-content\/uploads\/2022\/06\/Evolution-of-Data-On-Page-768x374.jpg 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<h3><span style=\"font-weight: 400\">Data Formats Evolution<\/span><\/h3>\n<table>\n<tbody>\n<tr>\n<td><b>Text \/ Binary Formats \u2192 Custom Formats \u2192 Columnar Formats \u2192 In Memory Columnar &amp; High Performance Formats<\/b><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400\">In the beginning, analysts stored most of the data in the Hadoop Distributed File System (HDFS) as text files or in binary formats like SequenceFile or RCFile. While some formats like text and JSON are readable to the bare eye, they consume a lot of storage space and are not performance friendly for large volumes of data.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Subsequently, engineers developed many open-source data serialization formats like Apache Avro and Google Protobuf to serialize structured data. They provide rich data structures and a compact, fast binary data. These formats continue to be used frequently for storing data.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Then engineers developed columnar formats like Apache ORC, Apache Parquet, Delta, and Apache Hudi that support better data compression and schema evolution handling. The columnar formats like ORC, Delta, and Hudi can also support ACID transactions to handle data updates and change streams.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">The columnar data formats and storage systems are already the most used across enterprises. The trend for the future will be to use in-memory columnar formats like Apache Arrow or high-performance formats like Apache Iceberg or Apache CarbonData that provide efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Internally, these formats still use ORC or Parquet to store the data making them compatible with the existing data stored.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Data Storage Evolution<\/span><\/h3>\n<table>\n<tbody>\n<tr>\n<td><b>HDFS \u2192 Hive \u2192 NoSQL \/ NewSQL \u2192 Cloud Data Warehouses + Blob Storage<\/b><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400\">HDFS was the initial distributed file-based storage system that allowed engineers to store large amounts of data on top of community hardware infrastructure. For example, engineers run the MapReduce programs on top of the files stored in HDFS.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Apache Hive and HBase frameworks followed this development, providing a table-like view of the underlying data and allowing developers to run SQL-like queries on the underlying data.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Soon after, several NoSQL databases were developed with different characteristics like wide-column, key-value store, document store, graph database, etc., to support specific use cases. Some popular NoSQL databases include Apache Cassandra, MongoDB, Apache CouchDB, Neo4J, Memcached in open source and Amazon DynamoDB, Azure CosmosDB, and Google Cloud BigTable, among commercial versions.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">During this period, engineers introduced an integration of traditional RDBMS with NoSQL as NewSQL that seeks to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees. Some NewSQL databases include Amazon Aurora, Google Cloud Spanner, CockroachDB, and Yugabyte DB, among others.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Most of the cloud storage is HDFS-compliant, and together with the serverless nature of this storage, enterprises are increasingly using them as the blob storage systems. Therefore, the trend for the near future will be to use cloud blob storage like Amazon S3, Azure Blob Storage\/ ADLS, and Google Cloud Storage as the landing zone for ingesting data. The data will then be processed and aggregated data will be persisted in Cloud data warehouses such as Amazon Redshift, Azure Synapse SQL Data warehouse, Google Cloud BigQuery, Snowflake, or Databricks DeltaLake.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Engineers will continue to use the NoSQL databases for specific data use cases as applicable.<\/span><\/p>\n<p><span style=\"font-weight: 400\">This concludes the second part of this blog series. We\u2019ll continue to explore the evolution of the data and analytics space in subsequent blog posts in this series in the coming months.\u00a0<\/span><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Data analytics is a dynamic, rapidly shifting space. See how this evolution has impacted the way we develop applications and store data.<\/p>\n","protected":false},"author":26,"featured_media":69498,"parent":0,"menu_order":164,"template":"","insight":[41],"insight-subcats":[1924,1925],"insight-industry":[1783],"insight-services":[1916],"insight-partners":[],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/insightsection\/69496"}],"collection":[{"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/insightsection"}],"about":[{"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/types\/insightsection"}],"author":[{"embeddable":true,"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/users\/26"}],"version-history":[{"count":1,"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/insightsection\/69496\/revisions"}],"predecessor-version":[{"id":111326,"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/insightsection\/69496\/revisions\/111326"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/media\/69498"}],"wp:attachment":[{"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/media?parent=69496"}],"wp:term":[{"taxonomy":"insight","embeddable":true,"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/insight?post=69496"},{"taxonomy":"insight-subcats","embeddable":true,"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/insight-subcats?post=69496"},{"taxonomy":"insight-industry","embeddable":true,"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/insight-industry?post=69496"},{"taxonomy":"insight-services","embeddable":true,"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/insight-services?post=69496"},{"taxonomy":"insight-partners","embeddable":true,"href":"https:\/\/www.globallogic.com\/wp-json\/wp\/v2\/insight-partners?post=69496"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}