Top ETL Interview Questions and Answer:
List of Top Interview Questions And Answers of ETL( Etract, Trasform , Load) with examples as follows:
- Compare between ETL and ELT.
Criteria | ETL | ELT |
Flexibility | High | Low |
Working methodology | Data from the source system to the data warehouse | Leverages the target system to transform data |
Performance | Average | Good |
- What is an ETL process?
ETL is the process of Extraction, Transformation, and Loading.
- How many steps are there in an ETL process?
In an ETL process, first data is extracted from a source, such as database servers, and this data is then used to generate business roll.
- What are the steps involved in an ETL process?
The steps involved are defining the source and the target, creating the mapping, creating the session, and creating the workflow.
- Can there be sub-steps for each of the ETL steps?
Each of the steps involved in ETL has several sub-steps. The transform step has more number of sub-steps.
- What are initial load and full load?
In ETL, the initial load is the process for populating all data warehousing tables for the very first time. In full load, when the data is loaded for the first time, all set records are loaded at a stretch depending on its volume. It would erase all contents from the table and would reload the fresh data.
- What is meant by incremental load?
Incremental load refers to applying dynamic changes as and when required in a specific period and predefined schedules.
- What is a 3-tier system in ETL?
The data warehouse is considered to be the 3-tier system in ETL.
- What are the three tiers in ETL?
The middle tier in ETL provides end users the data that is usable in a secure way. the other two layers are on either side of the middle tier, the end users and the back-end data storage.
- What are the names of the layers in ETL?
The first layer in ETL is the source layer, and it is the layer where data lands. The second layer is the integration layer where the data is stored after transformation. The third layer is the dimension layer where the actual presentation layer stands.
- What is meant by snapshots?
Snapshots are the copies of the read-only data that is stored in the master table.
- What are the characteristics of snapshots?
Snapshots are located on remote nodes and refreshed periodically so that the changes in the master table can be recorded. They are also the replica of tables.
- What are views?
Views are built using the attributes of one or more tables. View with a single table can be updated, but those with multiple tables cannot be updated.
- What is meant by a materialized view log?
A materialized view log is the pre-computed table with aggregated or joined data from the fact tables, as well as the dimension tables.
- What is a materialized view?
A materialized view is an aggregate table.
- What is the difference between PowerCenter and PowerMart?
PowerCenter processes large volumes of data, whereas Power Mart processes small volumes of data.
- With which apps can PowerCenter be connected?
PowerCenter can be connected with ERP sources such as SAP, Oracle Apps, PeopleSoft, etc.
- Which partition is used to improve the performances of ETL transactions?
To improve the performances of ETL transactions, the session partition is used.
- Does PowerMart provide connections to ERP sources?
No! PowerMart does not provide connections to any of the ERP sources. It also does not allow sessions partition.
- What is meant by partitioning in ETL?
Partitioning in ETL refers to the sub-division of the transactions in order to improve their performance.
- What is the benefit of increasing the number of partitions in ETL?
An increase in the number of partitions enables the Informatica server to create multiple connections to a host of sources.
- What are the types of partitions in ETL?
Types of partitions in ETL are Round-Robin partition and Hash partition.
- What is Round-Robin partitioning?
In Round-Robin partitioning, the data is evenly distributed by Informatica among all partitions. It is used when the number of rows in the process in each of the partitions is nearly the same.
- What is Hash partitioning?
In Hash partitioning, the Informatica server would apply a hash function in order to partition keys to group data among the partitions. It is used to ensure the processing of a group of rows with the same partitioning key in the same partition.
- What is mapping in ETL?
Mapping refers to the flow of data from the source to the destination.
- What is a session in ETL?
A session is a set of instructions that describe the data movement from the source to the destination.
- What is meant by Worklet in ETL?
Worklet is a set of tasks in ETL. It can be any set of tasks in the program.
- What is Workflow in ETL?
Workflow is a set of instructions that specify the way of executing the tasks to the Informatica.
- What is the use of Mapplet in ETL?
Mapplet in ETL is used for the purpose of creation as well as the configuration of a group of transformations.
- What is meant by operational data store?
The operational data store (ODS) is the repository that exists between the staging area and the data warehouse. The data stored in ODS has low granularity.
- How does the operational data store work?
Aggregated data is loaded into the enterprise data warehouse (EDW) after it is populated in the operational data store (ODS). Basically, ODS is a semi-data warehouse (DWH) that allows analysts to analyze the business data. The data persistence period in ODS is usually in the range of 30–45 days and not more.
- What does the ODS in ETL generate?
ODS in ETL generates primary keys, takes care of errors, and also rejects just like the DWH.
- When are the tables in ETL analyzed?
The use of the ANALYZE statement allows the validation and computing of statistics for either the index table or the cluster.
- How are the tables analyzed in ETL?
Statistics generated by the ANALYZE statement is reused by a cost-based optimizer in order to calculate the most efficient plan for data retrieval. The ANALYZE statement can support the validation of structures of objects, as well as space management, in the system. Operations include COMPUTER, ESTIMATE, and DELETE.
- How can the mapping be fine-tuned in ETL?
Steps for fine-tuning the mapping involves using the condition for filter in the source qualifying the data without the use of filter, utilizing persistence as well as cache store in lookup t/r, using the aggregations t/r in sorted i/p group by different ports, using operators in expressions instead of functions, and increasing the cache size and commit interval.
- What are the differences between connected and unconnected lookups in ETL?
Connected lookup is used for mapping and returns multiple values. It can be connected to another transformation and also returns a value. Unconnected lookup is used when the lookup is not available in the main flow, and it returns only a single output. It also cannot be connected to another transformation but is reusable.
Recommended Posts:
TOP 200+ JAVA Interview Questions And Answers
Most Recently Asked GIT Interview Questions And Answers
TOP 150+ Ansible Interview Questions And Answers
DevOps Interview Questions And Answers