Annotated bibliography literature: Fragment Allocation In Distributed Database Design

Fragment Allocation In Distributed Database DesignA selective informationbase that consists of two or more info files located at different sites on a calculater network. Because the database is distributed, different users can access it without interfering with one other. However, the DBMS must periodically synchronize the scattered databases to chip in sure that they all have consistent data, or in other words we can say that a distributed database is a database that is under the control of a central database management brass (DBMS) in which storage devices ar not all attached to a common CPU. It may be stored in manifold computers located in the same physical location, or may be dispersed oer a network of interconnected computers.Collections of data (e.g. in a database) can be distributed across quintuple physical locations. A distributed database can reside on network servers on the Internet, on corporate intranets or extranets, or on other company networks. Replication an d distribution of databases improve database performance at end-user worksites.To ensure that the distributive databases ar up to date and current, there are two sueesReplication.Duplication.Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be very complex and time consuming looking on the size and number of the distributive databases. This process can also require a lot of time and computer resourcefulnesss.Duplication on the other hand is not as complicated. It basically identifies one database as a master and past duplicates that database. The gemination process is normally done at a set time after hours. This is to ensure that all(prenominal) distributed location has the same data. In the duplication process, changes to the master database only are allowed. This is to ensure that local data wil l not be overwritten. Both of the processes can keep the data current in all distributive locations.Besides distributed database replication and fragmentation, there are both(prenominal)(prenominal) other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consonance and integrity.Basic architectureA database User accesses the distributed database throughLocal applicationsApplications which do not require data from other sites.Global applicationsApplications which do require data from other sites.A distributed database does not share main memory or disks.Main Features and Benefits of a Distributed SystemA common misconception among people when discussing distrib uted systems is that it is just another name for a network of computers. However, this overlooks an important distinction. A distributed system is built on top of a network and tries to hide the existence of multiple sovereign computers. It appears as a single entity providing the user with whatever serve are required. A network is a medium for interconnecting entities (such as computers and devices) enabling the exchange of messages base on well-known protocols between these entities, which are explicitly addressable (using an IP address, for example).There are various types of distributed systems, such as Clusters 3, Grids 4, P2P (Peer-to-Peer) networks, distributed storage systems and so on. A cluster is a dedicated group of interconnected computers that appears as a single super-computer, generally utilise in high performance scientific technology and business applications. A grid is a type of distributed system that enables coordinated overlap and aggregation of distribu ted, autonomous, heterogeneous resources based on users QoS (Quality of Service) requirements. Grids are commonly used to support applications emerging in the areas of e-Science and e-Business, which commonly involve geographically distributed comm unit of measurementies of people who engage in collaborative activities to solve gravid scale tasks and require sharing of various resources such as computers, data, applications and scientific instruments. P2P networks are decentralized distributed systems, which enable applications such as file-sharing, instant messaging, online multiuser gaming and content distribution over existence networks. Distributed storage systems such as NFS ( internet File System) provide users with a unified view of data stored on different file systems and computers which may be on the same or different networks.The main features of a distributed system includeFunctional Separation Based on the functionality/ aids provided, capability and purpose of each entity in the system.Inherent distribution Entities such as information, people, and systems are inherently distributed. For example, different information is created and maintained by different people. This information could be generated, stored, study and used by different systems or applications which may or may not be aware of the existence of the other entities in the system.Reliability pertinacious term data preservation and backup (replication) at different locations.Scalability Addition of more resources to increase performance or availability.Economy Sharing of resources by many entities to supporter reduce the hail of ownership. As a consequence of these features, the various entities in a distributed system can operate concurrently and possibly autonomously. Tasks are carried out independently and actions are co-ordinate at well-defined stages by exchanging messages. Also, entities are heterogeneous, and failures are independent. Generally, there is no single process, or entity, that has the knowledge of the inherent state of the system.Various kinds of distributed systems operate today, each aimed at solving different kinds of problems. The challenges faced in building a distributed system vary depending on the requirements of the system. In general, however, most systems will need to handle the following come out of the closetsHeterogeneity Various entities in the system must be able to interoperate with one another, despite differences in hardware architectures, operating systems, communication protocols, programming languages, software interfaces, security models, and data formats.Transparency The entire system should appear as a single unit and the complexity and interactions between the components should be typically hidden from the end user.Fault tolerance and failure management Failure of one or more components should not bring down the entire system, and should be isolated.Scalability The system should work efficiently with increasing number of users and addition of a resource should enhance the performance of the system.Concurrency divided up access to resources should be made possible.Openness and Extensibility Interfaces should be cleanly separated and publicly available to enable easy extensions to existing components and add new components.Migration and profane balancing Allow the movement of tasks within a system without affecting the operation of users or applications, and distribute load among available resources for improving performance. guarantor admission fee to resources should be secured to ensure only known users are able to perform allowed operations. Several software companies and research institutions have break downed distributed computing technologies that support some or all of the features described above.Fragment Allocation in Distributed Database DesignOn a Wide Area Network (WAN), fragment apportioning is a major issue in distributed database design since it concerns the overall perf ormance of distributed database systems. Here we propose a simple and comprehensive model that reflects transaction behavior in distributed databases. Based on the model and transaction information, twoHeuristic algorithms are developed to find a near-optimal allocation such that the total communication cost is minimized as much as possible. The cases show that the fragment allocation found by the algorithms is close to being an optimal one. Some experiments were also conducted to corroborate that the cost formulas can truly reflect the communication cost in the real world.INTRODUCTIONDistributed database design involves the following interrelated issues(1) How a global relative should be fragmented,(2) How many copies of a fragment should be replicated?(3) How fragments should be allocated to the sites of the communication network,(4) What the necessary information for fragmentation and allocation is. These issues complicate distributed database design. Even if each issue is cons idered individually, it is still an intractable problem. To simplify the overall problem, we address the fragment allocation issue only, assuming that all global relations have already been fragmented. Thus, the problem investigated here is determining the replicated number of each fragment and then finding a near-optimal allocation of all fragments, includingThe replicated ones, in a Wild Area Network (WAN) such that the total communication cost is minimized. For a read request issued by a transaction, it may be simple just to load the score fragment at the issuing site, or it may be a little complicated to load the target fragment from a remote site. A write request could be most complicated since a write propagation should be executed to maintain consistency among all the fragment copies if multiple fragment copies are spread throughout the network. The frequency of each request issued at the sites must also be considered in the allocation model. Since the behaviors of different legal proceeding maybe result in different optimal fragment allocations, cost formulas should be derived to minimize the transaction cost according to the transaction information.Alchemi An example distributed systemIn a typical corporate or academic environment there are many resources which are generally under-utilized for long periods of time. A resource in this context subject matter any entity that could be used to fulfill any user requirement this includes compute power (CPU), data storage, applications, and services. An go-ahead grid is a distributed system that dynamically aggregates and co-ordinates various resources within an organization and improves their utilization such that there is an overall increase in productivity for the users and processes. These benefits ultimately result in huge cost savings for the business, since they will not need to purchase expensive equipment for the purpose of running their high performance applications.The desirable features of an enterprise grid system areEnabling efficient and optimal resource usage.Sharing of inter-organizational resources.Secure authentication and authorization of users.Security of stored data and programs.Secure communication. concentrate / semi-centralized control.Auditing.Enforcement of Quality of Service (QoS) and Service Level Agreements (SLA).Interoperability of different grids (and hence the basis on open-standards).Support for transactional processes.Alchemi is an Enterprise Grid computing framework developed by researchers at theGRIDS Lab, in the Computer Science and Software Engineering Department at the University of Melbourne, Australia. It allows the user to aggregate the computing power of networked machines into a virtual supercomputer and develop applications to run on the Grid with no additional investment and no discernible impact on users. The main features offered by the Alchemi framework areVirtualization of compute resources across the LAN / Internet.Ease of deployme nt and management.Object-oriented Grid thread programming model for grid application development.File-based Grid job model for grid-enabling legacy applications.Web services interface for interoperability with other grid middleware.Open-source .Net based, simple installation using Windows installers.Alchemi Grids follow the master-slave architecture, with the additional capability ofConnecting multiple masters in a hierarchal or peer-to-peer fashion to provideScalability of the system. An Alchemi grid has three types of components namely theManager, the Executor, and the User Application itself. The Manager node is the master / controller whose main function is to service the userRequests for workload distribution. It receives a user request, authenticates the user, and distributes the workload across the various Executors that are connected to it. TheExecutor node is the one which actually performs the computation. Alchemi uses role based Security to authenticate users and authori ze execution. A simple grid is created by Installing Executors on each machine that is to be part of the grid and linking them to a primeval Manager Component.Advantages of distributed databasesManagement of distributed data with different levels of transparency.Increase reliability and availability.Easier expansion.Reflects organizational structure database fragments are located in the departments they relate to.Local autonomy a department can control the data about them (as they are the ones familiar with it.)Protection of valuable data if there were ever a catastrophic proceeds such as a fire, all of the data would not be in one place, but distributed in multiple locations.Improved performance data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database usage affect other modules of the database in a distributed database.)Economics it costs less to create a network of smaller computers with the power of a single large computer.Modularity systems can be modified, added and removed from the distributed database without affecting other modules (systems).Reliable transactions Due to replication of database.Hardware, Operating System, Network, Fragmentation, DBMS, Replication and Location Independence.Continuous operation.Distributed Query processing.Distributed consummation management.Disadvantages of distributed databasesComplexity extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to composition for the disconnected nature of the database for example, joins become prohibitively expensive when performed across multiple systems.Economics increased complexity and a more extensive understructure means extra labour costs.Secu rity remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured (e.g., by encrypting the network links between remote sites).Difficult to maintain integrity in a distributed database, enforcing integrity over a network may require too much of the networks resources to be feasible.Inexperience distributed databases are difficult to work with, and as a young field of operations there is not much readily available experience on proper practice.Lack of standards there are no tools or methodologies yet to help users convert a centralized DBMS into a distributed DBMS.Database design more complex besides of the normal difficulties, the design of a distributed database has to consider fragmentation of data, allocation of fragments to specific sites and data replication.Additional software is required.Operating System should support distributed environment.Concurrency control it is a major iss ue. It is solved by locking and time stamping.

Annotated bibliography literature

Tuesday, June 4, 2019

Fragment Allocation In Distributed Database Design

No comments:

Post a Comment