Wednesday, April 3, 2019
A Guide Into Business Intelligence Studies Information Technology Essay
A Guide Into Business Intelligence Studies randomness Technology Essay teaching W arhousing Integration of information from denary sources into large w beho mathematical functions and view as of on-line analytical processing and business finality makingDW vs. Operational infobases data storage wargonhouseSubject Oriented compoundNonvolatileTime variantAd hoc retrievalOperational Databases act lieLimited integrationContinuously updatedCurrent information determine onlyPredic evade retrievalData Wargonhouse a subject-oriented, integrated, cadence-variant, and nonvolatilisable collection of selective information in support of shell outments decision-making process.Data MartA mo nonhematic information warehouseDepartment- oriented or business line orientedTop-D cause ApproachAdvantagesA truly corporate effort, an effort sensible horizon of datainherently architected not a union of disparate data martsSingle, central storage of data active the contentCentralized rule s and find outMay see quick results if enforceed with iterationsDisadvantagesTakes longer to build flush with an iterative methodHigh exposure/risk to failure inevitably high aim of cross-functional skillsHigh outlay without proof of conceptbottom-up ApproachAdvantagesFaster and easier implementation of manageable piecesFavorable return on investment and proof of conceptLess risk of failureInherently incremental give notice schedule important data marts firstAllows spue team to learn and growDisadvantagesEach data mart has its own narrow view of dataPermeates redundant data in either data martPerpetuates inconsistent and irreconcilable dataProliferates unmanageable interfacesData Staging Comp mavinntThree major functions need to be bring abouted for acquiring the data ready (ETL)extract the datatransform the dataand wherefore load the data into the data warehouse storageData WarehouseSubject-Oriented Data is stored by subjectsIntegrated Data Need to pull together total ly the relevant data from the various systemsData from internal practicable systemsData from outside sourcesTime-Variant Data the stored data contains the current valueThe use needs data not only about the current purchase, only on the past purchasesNonvolatile Data Data from the operational systems are moved into the data warehouse at specific intervalsData graininess Data granularity in a data warehouse refers to the level of detailThe lower the level of detail, the finer the data granularityThe concluding level of detail a lot of data in the data warehouseFour steps in symmetryal modeling chance on the process being modeled.Determine the grain at which facts will be stored.Choose the props.Identify the numeric measures for the facts.Components of a star schema feature tables contain factual or quantitative data1N relationship between dimension tables and fact tables sableension tables contain descriptions about the subjects of the businessDimension tables are denormaliz ed to maximize performanceSlowly ever-ever-changing dimensionsAre the Customer and Product Dim independent of Time Dim?Changes in names, family status, product district/regionHow to handle these changes in drift not to affect the history status? Eg. Insurance3 suggestions for slowly changing dimensionsType 1 overwrite/erase old determine no accurate tracking of history needed elementary to implementType 2 create impudently record at clock of change partitioning the history (old and new-fangled description)Type 3 new current handle, legitimate need to track both old and new states Original and current values Intermediate Values are illogicalJunk DimensionsLeave the flags in the fact tables alikely sparse datano real browse entry capabilitycan significantly increase the sizing of the fact tableRemove the arrogates from the designpotentially critical information will be lostif they provide no relevance, remove themMake a flag into its own dimensionwhitethorn greatly increa se the number of dimensions, increasing the size of the fact tablecan clutter and confuse the designCombine all relevant flags, etc. into a single dimensionthe number of possibilities remain finiteinformation is retainedThe Monster DimensionIt is a compromiseAvoids creating copies of dimension records in a significantly large dimensionDone to manage space and changes efficiently3 cases of three-d dataData from outside sources (represented by the blue cylinder) is copied into the delicate red marble cube, which represents input multidimensional dataPre-calculated, stored results derived from iton-the-fly results, calculated as required at run-time, but not stored in a databaseAggregationThe system uses physically stored aggregates as a way to enhance performance of common queries.These aggregates, like indexes, are elect silently by the database if they are physically present.End users and application developers do not need to know what aggregates are available at both point in t ime, and applications are not required to explicitly mandate the name of an aggregateWhen you go for higher level of aggregates, the sparsity pct goes down, eventually reaching 100% of occupancyData Extraction devil major types of data extractions from the source operational systemsas is (static) data and data of revisionas is or static data is the capture of data at a given point in timeFor sign loadData of revision is known as incremental data captureData Quality IssuesDummy values in fieldsMissing dataUnofficial use of fieldsCryptic valuesContradicting valuesReused primary severalisesInconsistent valuesIncorrect valuesmultipurpose fieldsSteps in Data CleansingParsingCorrectingStandardizingMatchingConsolidating data TRANSFORMATIONAll the extracted data must be made useable in the data warehouseThe quality of the data in many old legacy systems is less likely to be good exuberant for the data warehouseTransformation of source data encompasses a unspecific variety of manipulat ions to change all the extracted source data into usable information to be stored in the data warehouseData warehouse practitioners keep up attempted to classify data transformations in several ways primary TasksSet of basic tasksSelectionSplitting/JoiningConversionsummarizationEnrichment consignmentingInitial LoadLoad modeIncremental scadsConstructive merge modeType 1 slowly changing dimension destructive merge modeFull RefreshLoad and append modes are applicableOLAP definedOn-line Analytical Processing(OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data by fast, consistent, interactive access in a wide variety of accomplishable views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as unders overlyd by the userUsers need the ability to perform multidimensional analysis with complex calculationsThe basic virtues of OLAPEnables analysts, executives, and mana gers to gain useful insights from the presentation of data bed reorganize metrics along several dimensions and supply data to be viewed from different perspectivesSupports multidimensional analysisIs able to commit down or roll up within each dimensionBUSINESS METADATAIs like a roadmap or an easy-to-use information directory showing the contents and how to get itHow can I sign onto and connect with the data warehouse?Which parts of the data warehouse can I access?Can I see all the attributes from a specific table?What are the definitions of the attributes I need in my query?Are there any queries and reports already predefined to give the results I need?TECHNICAL METADATATechnical metadata is meant for the IT mental faculty responsible for the development and administration of the data warehouseTechnical metadata is like a support guide for the IT professionals to build, maintain, and administer the data warehouse fleshly Design ObjectivesImprove PerformanceIn OLTP, 1-2 secs max in DW secs to mins hold in scalabilityManage storageProvide Ease of AdministrationDesign for Flexibility. material Design StepsDevelop StandardsCreate Aggregates PlanDetermine Data PartitioningEstablish Clustering OptionsPrepare index Strategy accord storage structuresPartitioningBreaking data into several physical units that can be handled separatelyNot a question of whether to do it in data warehouses but how to do itGranularity and partitioning are key to effective implementation of a warehousePartitions are spread crosswise multiple disks to boost performanceWhy Partition?Flexibility in managing dataSmaller physical units alloweasy restructuringfree indexsequential scans if neededeasy reorganizationeasy recoveryeasy monitoringImprove performanceCriterion for PartitioningVertically (groups of selected columns together. much typical in dimension tables)Horizontally (e.g. recent events and past history. typical in fact tables)ParallelizationThe argument goesif your main problem is that your queries run too slowly, use more than one machine at a time to make them run faster (Parallel Processing).Oracle uses this strategy in its warehousing products.IndexingStructure separate from the table data it refers to, storing the location of rows in the database based on the column values specified when the index is created.They are used in data warehouse to improve warehouse throughputIndexing and loadingIndexing for large tablesBtree characteristicsBalancedBushy multi-way treeBlock-oriented dynamicBitmap IndexBitmap indices are a special type of index designed for efficient querying on multiple keysRecords in a relation are assumed to be numbered sequentially from, say, 0 addicted a number n it must be easy to determine record nParticularly easy if records are of fixed sizeApplicable on attributes that take on a relatively secondary number of distinct valuesE.g. gender, country, state, E.g. income-level (income broken up into a small number of levels such as 0- 9999, 10000-19999, 20000-50000, 50000- infinity)A stingmap is simply an grade of bitsIn its simplest form a bitmap index on an attribute has a bitmap for each value of the attributeBitmap has as many bits as recordsIn a bitmap for value v, the bit for a record is 1 if the record has the value v for the attribute, and is 0 otherwiseClusteringThe technique involves placing and managing related units of data to be retrieved in the same physical block of storageThis arrangement causes related units of data to be retrieved together in one single operationIn a clustering index, the order of the rows is close to the index order. Close mean that physical records containing rows will not have to be accessed more than one time if the index is accessed sequentiallyDW DeploymentMajor deployment activities nab user sufferancePerform initial loadsGet user desktops readyComplete initial user trainingInstitute initial user supportDeploy in stagesDW Growth MaintenanceMonitoring the DWCollection of Stats purpose of StatsFor growth planningFor fine tuningUser trainingData ContentApplications ToolsDimensional Modeling ExerciseExercise Create a star schema diagram that will enable FIT-WORLD GYM INC. to try out their revenue. The fact table will include for every instance of revenue taken attribute(s) useful for analyzing revenue. The star schema will include all dimensions that can be useful for analyzing revenue. The only data sources available are shown bellow.SOURCE 1FIT-WORLD GYM Operational Database ER-Diagram and the tables based on it (with data) response
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment