Aggregator Transformation



1)    Active Transformation
2)    Connected Transformation
3)    Aggregator transform is much like the Group by clause in traditional SQL
4)    The Aggregator transformation allows you to perform aggregate calculations such as averages and sums.
Components of the Aggregator Transformation
1.     Aggregate expression
2.     Group by port
3.     Sorted Input
4.     Aggregate cache
1) Aggregate Expressions
  • Entered in an output port.
  • Can include non-aggregate expressions and conditional clauses.
The transformation language includes the following aggregate functions:
  • AVG, COUNT, MAX, MIN, SUM
  • FIRST, LAST
  • MEDIAN, PERCENTILE, STDDEV, VARIANCE
Single Level Aggregate Function: MAX(SAL)
Nested Aggregate Function: MAX( COUNT( ITEM ))
Nested Aggregate Functions
  • In Aggregator transformation, there can be multiple single level functions or multiple nested functions.
  • An Aggregator transformation cannot have both types of functions together.
  • MAX( COUNT( ITEM )) is correct.
  • MIN(MAX( COUNT( ITEM ))) is not correct. It can also include one aggregate function nested within another aggregate function
Conditional Clauses
We can use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
  • SUM( COMMISSION, COMMISSION > QUOTA )
Non-Aggregate Functions
We can also use non-aggregate functions in the aggregate expression.
  • IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))
2) Group By Ports
  • Indicates how to create groups.
  • When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
The Aggregator transformation allows us to define groups for aggregations, rather than performing the aggregation across all input data.
For example, we can find Maximum Salary for every Department.
  • In Aggregator Transformation, Open Ports tab and select Group By as needed.
3) Using Sorted Input
  • Use to improve session performance.
  • To use sorted input, we must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
  • When we use this option, we tell Aggregator that data coming to it is already sorted.
  • We check the Sorted Input Option in Properties Tab of the transformation.
  • If the option is checked but we are not passing sorted data to the transformation, then the session fails.
4) Aggregator Caches
  • The Power Center Server stores data in the aggregate cache until it completes Aggregate calculations.
  • It stores group values in an index cache and row data in the data cache. If the Power Center Server requires more space, it stores overflow values in cache files.
Note: The Power Center Server uses memory to process an Aggregator transformation with sorted ports. It does not use cache memory. We do not need to configure cache memory for Aggregator transformations that use sorted ports.
1) Aggregator Index Cache:
The index cache holds group information from the group by ports. If we are using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.
  • All Group By Columns are in AGGREGATOR INDEX CACHE. Ex. DEPTNO
2) Aggregator Data Cache:
DATA CACHE is generally larger than the AGGREGATOR INDEX CACHE.
Columns in Data Cache:
  • Variable ports if any
  • Non group by input/output ports.
  • Non group by input ports used in non-aggregate output expression.
  • Port containing aggregate function



Can you copy the session to a different folder or repository

1) we can, but we have to copy the mapping first and then we
have to copy the session....


2) Folder :Yes
Repository: No.

if you want to move the session to another repo, then you have to export the session and then import it in the target 

Repo.
This assuming that the mapping has already been moved.


  3) In addition you can copy the workflow from the Repository manager.
            This will automatically copy the mapping associated source targets
            and session to the target folder. 


4) Yes. By using copy session wizard u can copy a session in a
            different folder or repository.But that
            target folder or repository should consists of mapping of that
            session.
            If target folder or repository is not having the maping of copying
            session ,
            u should have to copy that maping first before u copy the session
 

how can we delete the cache files while we using aggregator transformaation?

After the session executes DTM removes the cache memory and deletes   cachefiles             Incase of persistant cache and incremtal aggregation cache files are     saved.
            Generally cache files are created in Informatica sever no default  memory is allocated for this once memory exceeded than these (index  cache) files You can see this in cache directory.

 cache are automatically deleted by the power ceter server(i.e Data cache and Index cache) after session completes If you use persistent cache power center save that cache for further use
            

What is aggregate cache in aggregator transforamtion?

1) when server runs the session with aggregate transformation it stores   data in memory until it completes the aggregation   when u partition a source the server creates one memory cache and one disk cache for each partition .it routes the data from one   partition to another based on group key values of the transformation
             


2) It is place where all the rows entering the Aggregator Transformation will be placed till the aggregate calculations like sum,Avg are made.
all the group information will be stored in index files and row data in data cache.


3) when server runs the session with aggregate transformation it stores             data in memory until it completes the aggregation   when u partition a source the server creates one memory cache and one disk cache for each partition .it routes the data from one     partition to another based on group key values of the transformation
            

What are the reusable transforamtions?

Reusable transformations can be used in multiple mappings.
When you need to incorporate this transformation  into
maping, U add an instance of it to maping. Later if yoU
change the definition of the transformation ,all instances
of it inherit the changes. Since the instance of reusable
transforamation is a pointer to that transforamtion, U can
change the transforamation in the transformation developer,
its instances automatically reflect these  changes. This
feature can save yoU great deal of work.
 
 
 If you want to perform similar task for diff mappings, and 
if your logic is also similar for the diff requirements. in 
that case instead of creating multiple mappings, we will 
create it once and we will select the option as resuable.
so that we can reuse it in other mappings also.
but if you want to do the modifications in reusable mapping 
its not possible. you need to do it in original 
transformation it will will reflect all the other instances 
of that particular transformation

What are the methods for creating reusable transforamtions

1.Design it in the transformation developer.

2.Promote a standard transformation from the mapping
designer. After yoU add a transformation to the mapping ,
yoU can promote it to the status of reusable transformation.
Once yoU promote a standard transformation to reusable
status, yoU can demote it to a standard transformation at
any time.

If you change the properties of a reusable transformation in
mapping, yoU can revert it to the original reusable
transformation properties by clicking the revert button.

How to use incremental aggregation in real time?

The first time you run a session with incremental aggregation            enabled the server process the entire source.
 At the end of the   session the server stores aggregate data from that session ran in
            two files the index file and data file.
The server creates the file   in local directory.
The second time you run the session use only  changes in the source as source data for the session. The server then performs the following actions:For each input record the  session checks the historical information in the index file for a  corresponding group then:If it finds a corresponding group – The server performs the aggregate operation incrementally using the            aggregate data for that group and saves the incremental changes.Else            Server create a new group and saves the record dataWhen writing to   the target the server applies the changes to the existing            target.Updates modified aggregate groups in the targetInserts new            aggregate dataDelete removed aggregate dataIgnores unchanged            aggregate dataSaves modified aggregate data in Index/Data files to     be used as historical data the next time you run the session.Each            Subsequent time you run the session with incremental aggregation you            use only the incremental source changes in the session.If the source            changes significantly and you want the server to continue saving the aggregate data for the future incremental changes configure the
      server to overwrite existing aggregate data with new aggregate data.

can anyone explain about incremental aggregation with an example?

 When you use aggregator transformation to aggregate it creates index              and data caches to store the data 1.Of group By columns 2. Of   aggreagte columns             the incremental aggreagtion is used when we have historical data in              place which will be used in aggregation incremental aggregation uses  the cache which contains the historical data and for each group by  column value already present in cache it add the data value to its             corresponding data cache value and outputs the row in case of a             incoming value having no match in index cache the new values for             group by and output ports are inserted into the cache .

What is the use of incremental aggregation? Explain me in brief with an example.


1) its a session option. when the informatica server performs cremental aggr. it passes new source data through the mapping and   uses historical chache data to perform new aggregation caluculations     incrementaly. for performance we will use it.


2) Incremental aggregation is in session properties i have 500 records    in my source and again i got 300 records if u r not using incremental aggregation what are calculation r using on 500 records  again that calculation will be done on 500+ 300 records if u r using  incremental aggregation calculation will be done one only what are  new records (300) that will be calculated dur to this one   performance will increasing.



what is incremantal aggregation?

1) When using incremental aggregation, you apply captured changes in the 
source to aggregate calculations in a session.
 2) If the source changes only incrementally and you can capture changes, 
you can configure the session to process only those changes. 
3) This allows the Informatica Server to update your target incrementally,
 rather than forcing it to process the entire source and recalculate the 
same calculations each time you run the session.