Bogotobogo
contact@bogotobogo.com
Bookmark and Share




Web Technologies
- Distributed Computing
www logo

Distributed Computing

Distribute Computing is a field of computer science studying distributed system. A distributed system consists of multiple nodes that communicate via network. The computers in each node interact each other to accomplish a shared goal.

Google Code University - Introduction to Distributed System Design

In other words, distributed computing uses distributed systems to solve problems.


Distributed-parallel

image source wiki







Three-tier Architecture

Three-tier architecture is a client-server architecture. The presentation, the application processing, and the data management are logically separate processes.

For example, an application that uses middleware to service data requests between a user and a database employs multi-tier architecture. The most widespread use of multi-tier architecture is the three-tier architecture.

three-tier

image source wiki





MapReduce

MapReduce is a framework for processing huge datasets on certain kinds of distributable problems using a large number of nodes, collectively referred to as a cluster. Computational processing can occur on data stored either in a filesystem (unstructured) or within a database.

It has two steps:

  • Map step:
    The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.

  • Reduce step:
    The master node then takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve.

mapreduce



Service-oriented Architecture (SOA)

SOA defines how to integrate widely disparate applications for a world that is Web based and uses multiple implementation platforms.
Rather than defining an API, SOA defines the interface in terms of protocols and functionality. An endpoint is the entry point for such an SOA implementation.

Service-orientation requires loose coupling of services with operating systems, and other technologies that underlie applications.
SOA separates functions into distinct units, or services, which developers make accessible over a network in order to allow users to combine and reuse them in the production of applications.
These services and their corresponding consumers communicate with each other by passing data in a well-defined, shared format, or by coordinating an activity between two or more services.


soa




Distributed Hashing

A distributed hash table (DHT) is a class of decentralized distributed system that provides a lookup service similar to a hash table; (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key.

Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

DHTs characteristically emphasize the following properties:

  • Decentralization:
    the nodes collectively form the system without any central coordination.
  • Scalability:
    the system should function efficiently even with thousands or millions of nodes.
  • Fault tolerance:
    the system should be reliable (in some sense) even with nodes continuously joining, leaving, and failing.

hashing

image source wiki





P2P

P2P or Peer-to-peer file sharing allows users to download files such as music, movies, and games using a P2P software client that searches for other connected computers.

Gnutella's P2P - 2nd Generation P2P

Peer-to-peer file sharing is different from traditional file downloading. In peer-to-peer sharing, we use a software program (rather than our Web browser) to locate computers that have the file we want. Because these are ordinary computers like ours, as opposed to servers, they are called peers. The process works like this:

  • We run peer-to-peer file-sharing software such as Gnutella on our computer and send out a request for the file we want to download.
  • To locate the file, the software queries other computers that are connected to the Internet and running the file-sharing software.
  • When the software finds a computer that has the file we want on its hard drive, the download begins. Others using the file-sharing software can obtain files they want from our computer's hard drive.
  • The file-transfer load is distributed between the computers exchanging files, but file searches and transfers from our computer to others can cause bottlenecks. Some people download files and immediately disconnect without allowing others to obtain files from their system, which is called leeching. This limits the number of computers the software can search for the requested file.

P2P



Bit Torrent's P2P -3rd Generation P2P

Unlike some other peer-to-peer downloading methods, BitTorrent is a protocol that offloads some of the file tracking work to a central tracker server. Another difference is that it uses a principal called tit-for-tat. This means that in order to receive files, we have to give them. This solves the problem of leeching -- one of developer Bram Cohen's primary goals. With BitTorrent, the more files we share with others, the faster our downloads are. Finally, to make better use of available Internet bandwidth (the pipeline for data transmission), BitTorrent downloads different pieces of the file we want simultaneously from multiple computers.

Here's how it works:

  • We open a Web page and click on a link for the file we want.
  • BitTorrent client software communicates with a tracker to find other computers running BitTorrent that have the complete file and those with a portion of the file (peers that are usually in the process of downloading the file).
  • The tracker identifies the swarm, which is the connected computers that have all of or a portion of the file and are in the process of sending or receiving it.
  • The tracker helps the client software trade pieces of the file we want with other computers in the swarm. Our computer receives multiple pieces of the file simultaneously.
  • If we continue to run the BitTorrent client software after our download is complete, others can receive .torrent files from our computer; our future download rates improve because we are ranked higher in the system.
  • Downloading pieces of the file at the same time helps solve a common problem with other peer-to-peer download methods: Peers upload at a much slower rate than they download. By downloading multiple pieces at the same time, the overall speed is greatly improved. The more computers involved in the swarm, the faster the file transfer occurs because there are more sources of each piece of the file. For this reason, BitTorrent is especially useful for large, popular files.

bit torrent




  • Open Grid Forum (OGF)
    A community-initiated forum of individual researchers and practitioners working on distributed computing, or "grid" technologies.
  • Mobile Information and Communication Systems
    Research on self organizing mobile ad hoc networking technology and infrastructure. Project proposals and publications.
  • IEEE Task Force on Cluster Computing
    The Task Force is concerned with issues related to the design, analysis, development and implementation of cluster-based systems.
  • The Israeli Association of Grid Technologies (IGT)
    The Israeli Association of Grid Technologies (IGT) The IGT is a non-profit organization of leading vendors, ISVs, customers and academia, focused on knowledge sharing and networking for developing Enterprise Grid solutions. It is open, independent and vendor-neutral.
  • Albatross - Wide Area Cluster Computing
    A project to better understand application behavior on wide-area networks. Publications, talks, software, status.
  • The Distributed ASCI Supercomputer (DAS)
    An experimental wide-area distributed computing cluster used for parallel computing research at five Dutch universities.
  • Flash Mob Computing
    Home of the first Flash Mob Supercomputer and the official site for all things Flash Mob Computing.
  • Cetus Links: Distributed Objects and Components
    Over a thousand links organized by the volunteer members of the Cetus organization.
  • Distributed Computing Primer
    Introduction to organising a distributed computing project.
  • Distributed computing gets a corporate twist
    "Grid" technology has largely been an academic phenomenon, but IBM gives the idea a corporate twist with its Grid Computing Initiative. [CNet News]
  • Is Distributed Computing a Crime?
    Computer network administrator faces multiple felony charges and years in prison for allegedly installing Distributed.net clients without permission. [SecurityFocus]
  • Distributed Systems
    Descriptions, bibliographies, and links to distributed operating systems, file systems, and computing environments.
  • The DC Zone
    open to other projects.
  • Distributed Computing: An Introduction
    From global distributed projects like Seti@Home to corporate uses behind the firewall, we cover the fundamentals of distributed computing architectures, discuss major initiatives and applications, and talk about the challenges that lay ahead.
  • Distributed.outrage
    University computer administrator accused of "hacking" crimes, for installing a screensaver that supported distributed computing research. [Salon]
  • Jeff Sutherland's Object Technology Site
    Breaking news on distributed computing, object technology, components, and business objects.
  • Kasparov Goes Bigger than Blue -
    World chess champion Garry Kasparov, famous for tangling with IBM's Deep Blue supercomputer, takes on the world. His latest game uses distributed computing to connect any challenger. By Joyce Slaton. [Wired]
  • An Eye on Grid Technology
    Grid computing and grid services portal featuring articles, papers, news and streaming media content.
  • DCcentral
    Introduction to distributed computing in English, French, or Spanish. Includes tutorials, history, and projects.