While exploring the latest data warehousing technologies and
its concepts, I found that Amazon has also jumped into the Massively Parallel Processing
(MPP) battle. Finding this information, I thought of writing a brief about some
of the vendors providing such offering on their plate. As the data has been increasing at a fast pace and the older
database management systems needs to upgrade their technologies. This huge
amount of data needs to be processed at speed to provide most value hidden in
that data. Massively parallel processing (MPP) is an architecture which allows
this new class of warehouse to split up large data analytics jobs into smaller
and more controllable chunks, which are then scattered to multiple processors. MPP
can be simply defined or looked upon as a type of computing that uses multiple separate
CPUs running in parallel to execute one single program. Systems with hundreds
and thousands of such processors are known as massively parallel. Let’s have a brief idea about what some of the vendors are offering
on their plates:
Amazon –Amazon Redshift enables customers to obtain dramatically
increased query performance when analyzing datasets ranging in size from
hundreds of gigabytes to a petabyte or more, using the same SQL-based business
intelligence tools they use today. Redshift has a massively parallel processing (MPP)
architecture, which enables it to distribute and parallelize queries across
multiple low cost nodes. The nodes themselves are designed specifically for
data warehousing workloads. They contain large amounts of locally attached
storage on multiple spindles and are connected by a minimally oversubscribed 10
Gigabit Ethernet network. Redshift runs the Paraccel PADB high-performance columnar,
compressed DBMS, scaling to 100 8XL nodes, or 1.6PB of compressed data. XL
nodes have 2 virtual cores, with 15GB of memory, while 8XL nodes have 16
virtual cores and 120 GB of memory and operate on 10 Gigabit Ethernet. But in
case of Amazon, it's majorly the cost which has played a major role. Using the
AWS Management Console, customers can launch a Redshift cluster, starting with gigabytes
and scaling to more than a petabyte, costing less than $1,000 per terabyte per
year. It can be termed as cheap in data warehousing terms compared to around $25,000
(approx.) per terabyte per year that companies are used charging for an on-premises
deployments. Cost can never be the only judicious option because apart from
the benefits, the offering may result in a bad boy for you – as data will be
out of the corporate firewall and in some ways settled outside without your control,
bandwidth and security costs, migrating to Redshift could also result in shifting
your applications to some other part of the AWS ecosystem. These are just my
assumptions from the understanding which I have gained from different articles.
Let’s wait till a practical hands-on or technical review for more clarity about
the reality.
Microsoft –Microsoft has been working long ago for the MPP data
warehousing solution. Microsoft has introduced MPP architecture in SQL Server
2008 in the form of an appliance named as Microsoft Parallel Data Warehouse
(PDW) appliance. Now, recently we can see the latest refresh version with a new
data processing engine including PolyBase, a technology which can handle both
relational data and non-relational databases. Polybase is supposed to run on
Microsoft's version of Hadoop, and will be a kind of revolution in the data
warehousing market. Also, the newly introduced in-memory computing concept Hekaton
seems to make Microsoft a strong competitor in the DW market. Microsoft’s
all-time hit, Office application especially Excel, can be easily integrated
with these applications for providing end user the best and easy to use
interface for perform end user analytics. Since these solutions are not in the
market completely, we just have to wait till these solutions are live in the market
and then have hand-on experience.
Greenplum –Similar to these vendors, Greenplum is also offering MPP on
its plate of data warehousing solution with additional capability of automatic
parallelization of data loading and queries. It basically uses the technology known
as Scatter/Gather Streaming which has
loading speed of around 10 terabytes per hour, per rack, with linear
scalability. The data is repeatedly partitioned completely across all nodes of
the system, and queries are scheduled and performed using all nodes working
together in a highly synchronized style.
IBM Netezza –IBM with its new Netezza appliance has also hit the market
with the target of revolutionizing the DW market. Netezza's unique Field
Programmable Gate Arrays (FPGA) combined with multi-core CPUs claims to deliver
more than expected performance. It basically operates concurrently on the data
stream in a pipeline fashion, maximizing utilization and extracting the utmost
throughput from each MPP node delivering linear scalability to more than a
thousand processing streams executing in parallel, while offering a very
economical total cost of ownership.
This blog is completely based on my understanding from various articles and news. The information here are just my personal opinion and does not reflect anybody’s view. This list goes on and on for most of the data warehousing vendors struggling/competing in the market. I will explore other vendors and try to provide my view towards them later.
This blog is completely based on my understanding from various articles and news. The information here are just my personal opinion and does not reflect anybody’s view. This list goes on and on for most of the data warehousing vendors struggling/competing in the market. I will explore other vendors and try to provide my view towards them later.
References:
that is really a nice bloh keep this work up.thanks for sharing suc blogs.
ReplyDeletelcd tv repairing course
led lcd repairing course
led tv repairing institute
led tv repairing Course training course.we provide totally practically training. complete ...
ReplyDeleteled tv repairing course
mobile repairing course in delhi
led tv repairing course
mobile repairing course in delhi
junk cars are essentially vehicles that are old and damaged enough that selling it parts would be more cost effective than spending money on repairs.cash for junk cars is a nationwide junk car buyer with a presence in almost all the states.a junkyard will pay you cash for junk cars, and they aren't picky.scrap car removal service fall into this category too.what you're paid for scrap.
ReplyDelete