Convert from many sub-tables to a single tidy dataframe

I have data in a bad format that I want to make tidy using an R script. I lack the skills to convert it into the format I want, so I look for an answer that (a) outlines a method, or (b) hands me the entire script.

The definition of tidy here is that the output has the following columns:

swap_id a unique identifier.

leg either pay or rec.

cashflow the amount that is recieved

date the date the cashflow arrives

type whether the cashflow is of INTEREST of FINAL_EX.

The variable leg takes on two different values: pay and rec. Currently leg=pay is in one excel sheet and leg=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.

I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w

Here is a sample of the content of sheet "rec":

    id1                                 id2         

    date        cf      type            date        cf        type  

    2017-04-04  42961   INTEREST        2015-04-07  33953   INTEREST    

    2017-07-04  43438   INTEREST        2016-04-04  203161  INTEREST    

    2017-10-04  43915   INTEREST        2017-04-04  203161  INTEREST    

    2018-01-04  43915   INTEREST        2018-04-04  203161  INTEREST    

    2018-04-04  42961   INTEREST        2019-04-04  203161  INTEREST    

    2018-07-04  43438   INTEREST        2020-04-06  203161  INTEREST    

    2018-10-04  43915   INTEREST        2021-04-06  203161  INTEREST    

    2019-01-04  43915   INTEREST        2022-04-04  203161  INTEREST    

    2019-04-04  42961   INTEREST        2023-04-04  203161  INTEREST    

    2019-07-04  43438   INTEREST        2023-04-04  5016330 FINAL_EX    

    2019-10-04  43915   INTEREST                    

    2020-01-07  43915   INTEREST                    

    2020-04-06  43438   INTEREST                    

    2020-07-06  43438   INTEREST                    

    2020-10-05  43915   INTEREST                    

    2021-01-04  43915   INTEREST                    

    2021-04-06  42961   INTEREST                    

    2021-07-05  43438   INTEREST                    

    2021-10-04  43915   INTEREST                    

    2021-10-04  2988563 FINAL_EX

Here is a sample of the content of sheet "pay":

    id1                                 id2     

    date        cf      type            date        cf      type

    2017-04-04  5250    INTEREST        2015-04-07  30938   INTEREST

    2017-07-04  5308    INTEREST        2016-04-04  30938   INTEREST

    2017-10-04  5367    INTEREST        2017-04-04  30938   INTEREST

    2018-01-04  5367    INTEREST        2018-04-04  30938   INTEREST

    2018-04-04  5250    INTEREST        2019-04-04  30938   INTEREST

    2018-07-04  5308    INTEREST        2020-04-06  30938   INTEREST

    2018-10-04  5367    INTEREST        2021-04-06  30938   INTEREST

    2019-01-04  5367    INTEREST        2022-04-04  30938   INTEREST

    2019-04-04  5250    INTEREST        2023-04-04  30938   INTEREST

    2019-07-04  5308    INTEREST        2023-04-04  540000  FINAL_EX

    2019-10-04  5367    INTEREST                

    2020-01-06  5367    INTEREST                

    2020-04-06  5308    INTEREST                

    2020-07-06  5308    INTEREST                

    2020-10-05  5367    INTEREST                

    2021-01-04  5367    INTEREST                

    2021-04-06  5250    INTEREST                

    2021-07-05  5308    INTEREST                

    2021-10-04  5367    INTEREST                

    2021-10-04  315000  FINAL_EX

In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.

Here is how I want the output to be structured after the transformation.

    swap_id leg date        cf      type

    id1     pay 2017-04-04  5250    INTEREST

    id1     pay 2017-07-04  5308    INTEREST

    id1     pay 2017-10-04  5367    INTEREST

    id1     pay 2018-01-04  5367    INTEREST

    id1     pay 2018-04-04  5250    INTEREST

    id1     pay 2018-07-04  5308    INTEREST

    id1     pay 2018-10-04  5367    INTEREST

    id1     pay 2019-01-04  5367    INTEREST

    id1     pay 2019-04-04  5250    INTEREST

    id1     pay 2019-07-04  5308    INTEREST

    id1     pay 2019-10-04  5367    INTEREST

    id1     pay 2020-01-06  5367    INTEREST

    id1     pay 2020-04-06  5308    INTEREST

    id1     pay 2020-07-06  5308    INTEREST

    id1     pay 2020-10-05  5367    INTEREST

    id1     pay 2021-01-04  5367    INTEREST

    id1     pay 2021-04-06  5250    INTEREST

    id1     pay 2021-07-05  5308    INTEREST

    id1     pay 2021-10-04  5367    INTEREST

    id1     pay 2021-10-04  315000  FINAL_EX

    id1     rec 2017-04-04  42961   INTEREST

    id1     rec 2017-07-04  43438   INTEREST

    id1     rec 2017-10-04  43915   INTEREST

    id1     rec 2018-01-04  43915   INTEREST

    id1     rec 2018-04-04  42961   INTEREST

    id1     rec 2018-07-04  43438   INTEREST

    id1     rec 2018-10-04  43915   INTEREST

    id1     rec 2019-01-04  43915   INTEREST

    id1     rec 2019-04-04  42961   INTEREST

    id1     rec 2019-07-04  43438   INTEREST

    id1     rec 2019-10-04  43915   INTEREST

    id1     rec 2020-01-07  43915   INTEREST

    id1     rec 2020-04-06  43438   INTEREST

    id1     rec 2020-07-06  43438   INTEREST

    id1     rec 2020-10-05  43915   INTEREST

    id1     rec 2021-01-04  43915   INTEREST

    id1     rec 2021-04-06  42961   INTEREST

    id1     rec 2021-07-05  43438   INTEREST

    id1     rec 2021-10-04  43915   INTEREST

    id1     rec 2021-10-04  2988563 FINAL_EX

    id2 …   …   …   … and so on...

asked May 24 '18 at 10:01

jacob

148110

bumped to the homepage by Community♦ 4 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57

add a comment |

The definition of tidy here is that the output has the following columns:

swap_id a unique identifier.

leg either pay or rec.

cashflow the amount that is recieved

date the date the cashflow arrives

type whether the cashflow is of INTEREST of FINAL_EX.

The variable leg takes on two different values: pay and rec. Currently leg=pay is in one excel sheet and leg=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.

I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w

Here is a sample of the content of sheet "rec":

    id1                                 id2         

    date        cf      type            date        cf        type  

    2017-04-04  42961   INTEREST        2015-04-07  33953   INTEREST    

    2017-07-04  43438   INTEREST        2016-04-04  203161  INTEREST    

    2017-10-04  43915   INTEREST        2017-04-04  203161  INTEREST    

    2018-01-04  43915   INTEREST        2018-04-04  203161  INTEREST    

    2018-04-04  42961   INTEREST        2019-04-04  203161  INTEREST    

    2018-07-04  43438   INTEREST        2020-04-06  203161  INTEREST    

    2018-10-04  43915   INTEREST        2021-04-06  203161  INTEREST    

    2019-01-04  43915   INTEREST        2022-04-04  203161  INTEREST    

    2019-04-04  42961   INTEREST        2023-04-04  203161  INTEREST    

    2019-07-04  43438   INTEREST        2023-04-04  5016330 FINAL_EX    

    2019-10-04  43915   INTEREST                    

    2020-01-07  43915   INTEREST                    

    2020-04-06  43438   INTEREST                    

    2020-07-06  43438   INTEREST                    

    2020-10-05  43915   INTEREST                    

    2021-01-04  43915   INTEREST                    

    2021-04-06  42961   INTEREST                    

    2021-07-05  43438   INTEREST                    

    2021-10-04  43915   INTEREST                    

    2021-10-04  2988563 FINAL_EX

Here is a sample of the content of sheet "pay":

    id1                                 id2     

    date        cf      type            date        cf      type

    2017-04-04  5250    INTEREST        2015-04-07  30938   INTEREST

    2017-07-04  5308    INTEREST        2016-04-04  30938   INTEREST

    2017-10-04  5367    INTEREST        2017-04-04  30938   INTEREST

    2018-01-04  5367    INTEREST        2018-04-04  30938   INTEREST

    2018-04-04  5250    INTEREST        2019-04-04  30938   INTEREST

    2018-07-04  5308    INTEREST        2020-04-06  30938   INTEREST

    2018-10-04  5367    INTEREST        2021-04-06  30938   INTEREST

    2019-01-04  5367    INTEREST        2022-04-04  30938   INTEREST

    2019-04-04  5250    INTEREST        2023-04-04  30938   INTEREST

    2019-07-04  5308    INTEREST        2023-04-04  540000  FINAL_EX

    2019-10-04  5367    INTEREST                

    2020-01-06  5367    INTEREST                

    2020-04-06  5308    INTEREST                

    2020-07-06  5308    INTEREST                

    2020-10-05  5367    INTEREST                

    2021-01-04  5367    INTEREST                

    2021-04-06  5250    INTEREST                

    2021-07-05  5308    INTEREST                

    2021-10-04  5367    INTEREST                

    2021-10-04  315000  FINAL_EX

In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.

Here is how I want the output to be structured after the transformation.

    swap_id leg date        cf      type

    id1     pay 2017-04-04  5250    INTEREST

    id1     pay 2017-07-04  5308    INTEREST

    id1     pay 2017-10-04  5367    INTEREST

    id1     pay 2018-01-04  5367    INTEREST

    id1     pay 2018-04-04  5250    INTEREST

    id1     pay 2018-07-04  5308    INTEREST

    id1     pay 2018-10-04  5367    INTEREST

    id1     pay 2019-01-04  5367    INTEREST

    id1     pay 2019-04-04  5250    INTEREST

    id1     pay 2019-07-04  5308    INTEREST

    id1     pay 2019-10-04  5367    INTEREST

    id1     pay 2020-01-06  5367    INTEREST

    id1     pay 2020-04-06  5308    INTEREST

    id1     pay 2020-07-06  5308    INTEREST

    id1     pay 2020-10-05  5367    INTEREST

    id1     pay 2021-01-04  5367    INTEREST

    id1     pay 2021-04-06  5250    INTEREST

    id1     pay 2021-07-05  5308    INTEREST

    id1     pay 2021-10-04  5367    INTEREST

    id1     pay 2021-10-04  315000  FINAL_EX

    id1     rec 2017-04-04  42961   INTEREST

    id1     rec 2017-07-04  43438   INTEREST

    id1     rec 2017-10-04  43915   INTEREST

    id1     rec 2018-01-04  43915   INTEREST

    id1     rec 2018-04-04  42961   INTEREST

    id1     rec 2018-07-04  43438   INTEREST

    id1     rec 2018-10-04  43915   INTEREST

    id1     rec 2019-01-04  43915   INTEREST

    id1     rec 2019-04-04  42961   INTEREST

    id1     rec 2019-07-04  43438   INTEREST

    id1     rec 2019-10-04  43915   INTEREST

    id1     rec 2020-01-07  43915   INTEREST

    id1     rec 2020-04-06  43438   INTEREST

    id1     rec 2020-07-06  43438   INTEREST

    id1     rec 2020-10-05  43915   INTEREST

    id1     rec 2021-01-04  43915   INTEREST

    id1     rec 2021-04-06  42961   INTEREST

    id1     rec 2021-07-05  43438   INTEREST

    id1     rec 2021-10-04  43915   INTEREST

    id1     rec 2021-10-04  2988563 FINAL_EX

    id2 …   …   …   … and so on...

asked May 24 '18 at 10:01

jacob

148110

bumped to the homepage by Community♦ 4 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57

add a comment |

The definition of tidy here is that the output has the following columns:

swap_id a unique identifier.

leg either pay or rec.

cashflow the amount that is recieved

date the date the cashflow arrives

type whether the cashflow is of INTEREST of FINAL_EX.

The variable leg takes on two different values: pay and rec. Currently leg=pay is in one excel sheet and leg=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.

I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w

Here is a sample of the content of sheet "rec":

    id1                                 id2         

    date        cf      type            date        cf        type  

    2017-04-04  42961   INTEREST        2015-04-07  33953   INTEREST    

    2017-07-04  43438   INTEREST        2016-04-04  203161  INTEREST    

    2017-10-04  43915   INTEREST        2017-04-04  203161  INTEREST    

    2018-01-04  43915   INTEREST        2018-04-04  203161  INTEREST    

    2018-04-04  42961   INTEREST        2019-04-04  203161  INTEREST    

    2018-07-04  43438   INTEREST        2020-04-06  203161  INTEREST    

    2018-10-04  43915   INTEREST        2021-04-06  203161  INTEREST    

    2019-01-04  43915   INTEREST        2022-04-04  203161  INTEREST    

    2019-04-04  42961   INTEREST        2023-04-04  203161  INTEREST    

    2019-07-04  43438   INTEREST        2023-04-04  5016330 FINAL_EX    

    2019-10-04  43915   INTEREST                    

    2020-01-07  43915   INTEREST                    

    2020-04-06  43438   INTEREST                    

    2020-07-06  43438   INTEREST                    

    2020-10-05  43915   INTEREST                    

    2021-01-04  43915   INTEREST                    

    2021-04-06  42961   INTEREST                    

    2021-07-05  43438   INTEREST                    

    2021-10-04  43915   INTEREST                    

    2021-10-04  2988563 FINAL_EX

Here is a sample of the content of sheet "pay":

    id1                                 id2     

    date        cf      type            date        cf      type

    2017-04-04  5250    INTEREST        2015-04-07  30938   INTEREST

    2017-07-04  5308    INTEREST        2016-04-04  30938   INTEREST

    2017-10-04  5367    INTEREST        2017-04-04  30938   INTEREST

    2018-01-04  5367    INTEREST        2018-04-04  30938   INTEREST

    2018-04-04  5250    INTEREST        2019-04-04  30938   INTEREST

    2018-07-04  5308    INTEREST        2020-04-06  30938   INTEREST

    2018-10-04  5367    INTEREST        2021-04-06  30938   INTEREST

    2019-01-04  5367    INTEREST        2022-04-04  30938   INTEREST

    2019-04-04  5250    INTEREST        2023-04-04  30938   INTEREST

    2019-07-04  5308    INTEREST        2023-04-04  540000  FINAL_EX

    2019-10-04  5367    INTEREST                

    2020-01-06  5367    INTEREST                

    2020-04-06  5308    INTEREST                

    2020-07-06  5308    INTEREST                

    2020-10-05  5367    INTEREST                

    2021-01-04  5367    INTEREST                

    2021-04-06  5250    INTEREST                

    2021-07-05  5308    INTEREST                

    2021-10-04  5367    INTEREST                

    2021-10-04  315000  FINAL_EX

In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.

Here is how I want the output to be structured after the transformation.

    swap_id leg date        cf      type

    id1     pay 2017-04-04  5250    INTEREST

    id1     pay 2017-07-04  5308    INTEREST

    id1     pay 2017-10-04  5367    INTEREST

    id1     pay 2018-01-04  5367    INTEREST

    id1     pay 2018-04-04  5250    INTEREST

    id1     pay 2018-07-04  5308    INTEREST

    id1     pay 2018-10-04  5367    INTEREST

    id1     pay 2019-01-04  5367    INTEREST

    id1     pay 2019-04-04  5250    INTEREST

    id1     pay 2019-07-04  5308    INTEREST

    id1     pay 2019-10-04  5367    INTEREST

    id1     pay 2020-01-06  5367    INTEREST

    id1     pay 2020-04-06  5308    INTEREST

    id1     pay 2020-07-06  5308    INTEREST

    id1     pay 2020-10-05  5367    INTEREST

    id1     pay 2021-01-04  5367    INTEREST

    id1     pay 2021-04-06  5250    INTEREST

    id1     pay 2021-07-05  5308    INTEREST

    id1     pay 2021-10-04  5367    INTEREST

    id1     pay 2021-10-04  315000  FINAL_EX

    id1     rec 2017-04-04  42961   INTEREST

    id1     rec 2017-07-04  43438   INTEREST

    id1     rec 2017-10-04  43915   INTEREST

    id1     rec 2018-01-04  43915   INTEREST

    id1     rec 2018-04-04  42961   INTEREST

    id1     rec 2018-07-04  43438   INTEREST

    id1     rec 2018-10-04  43915   INTEREST

    id1     rec 2019-01-04  43915   INTEREST

    id1     rec 2019-04-04  42961   INTEREST

    id1     rec 2019-07-04  43438   INTEREST

    id1     rec 2019-10-04  43915   INTEREST

    id1     rec 2020-01-07  43915   INTEREST

    id1     rec 2020-04-06  43438   INTEREST

    id1     rec 2020-07-06  43438   INTEREST

    id1     rec 2020-10-05  43915   INTEREST

    id1     rec 2021-01-04  43915   INTEREST

    id1     rec 2021-04-06  42961   INTEREST

    id1     rec 2021-07-05  43438   INTEREST

    id1     rec 2021-10-04  43915   INTEREST

    id1     rec 2021-10-04  2988563 FINAL_EX

    id2 …   …   …   … and so on...

asked May 24 '18 at 10:01

jacob

148110

The definition of tidy here is that the output has the following columns:

swap_id a unique identifier.

leg either pay or rec.

cashflow the amount that is recieved

date the date the cashflow arrives

type whether the cashflow is of INTEREST of FINAL_EX.

The variable leg takes on two different values: pay and rec. Currently leg=pay is in one excel sheet and leg=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.

I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w

Here is a sample of the content of sheet "rec":

    id1                                 id2         

    date        cf      type            date        cf        type  

    2017-04-04  42961   INTEREST        2015-04-07  33953   INTEREST    

    2017-07-04  43438   INTEREST        2016-04-04  203161  INTEREST    

    2017-10-04  43915   INTEREST        2017-04-04  203161  INTEREST    

    2018-01-04  43915   INTEREST        2018-04-04  203161  INTEREST    

    2018-04-04  42961   INTEREST        2019-04-04  203161  INTEREST    

    2018-07-04  43438   INTEREST        2020-04-06  203161  INTEREST    

    2018-10-04  43915   INTEREST        2021-04-06  203161  INTEREST    

    2019-01-04  43915   INTEREST        2022-04-04  203161  INTEREST    

    2019-04-04  42961   INTEREST        2023-04-04  203161  INTEREST    

    2019-07-04  43438   INTEREST        2023-04-04  5016330 FINAL_EX    

    2019-10-04  43915   INTEREST                    

    2020-01-07  43915   INTEREST                    

    2020-04-06  43438   INTEREST                    

    2020-07-06  43438   INTEREST                    

    2020-10-05  43915   INTEREST                    

    2021-01-04  43915   INTEREST                    

    2021-04-06  42961   INTEREST                    

    2021-07-05  43438   INTEREST                    

    2021-10-04  43915   INTEREST                    

    2021-10-04  2988563 FINAL_EX

Here is a sample of the content of sheet "pay":

    id1                                 id2     

    date        cf      type            date        cf      type

    2017-04-04  5250    INTEREST        2015-04-07  30938   INTEREST

    2017-07-04  5308    INTEREST        2016-04-04  30938   INTEREST

    2017-10-04  5367    INTEREST        2017-04-04  30938   INTEREST

    2018-01-04  5367    INTEREST        2018-04-04  30938   INTEREST

    2018-04-04  5250    INTEREST        2019-04-04  30938   INTEREST

    2018-07-04  5308    INTEREST        2020-04-06  30938   INTEREST

    2018-10-04  5367    INTEREST        2021-04-06  30938   INTEREST

    2019-01-04  5367    INTEREST        2022-04-04  30938   INTEREST

    2019-04-04  5250    INTEREST        2023-04-04  30938   INTEREST

    2019-07-04  5308    INTEREST        2023-04-04  540000  FINAL_EX

    2019-10-04  5367    INTEREST                

    2020-01-06  5367    INTEREST                

    2020-04-06  5308    INTEREST                

    2020-07-06  5308    INTEREST                

    2020-10-05  5367    INTEREST                

    2021-01-04  5367    INTEREST                

    2021-04-06  5250    INTEREST                

    2021-07-05  5308    INTEREST                

    2021-10-04  5367    INTEREST                

    2021-10-04  315000  FINAL_EX

In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.

Here is how I want the output to be structured after the transformation.

    swap_id leg date        cf      type

    id1     pay 2017-04-04  5250    INTEREST

    id1     pay 2017-07-04  5308    INTEREST

    id1     pay 2017-10-04  5367    INTEREST

    id1     pay 2018-01-04  5367    INTEREST

    id1     pay 2018-04-04  5250    INTEREST

    id1     pay 2018-07-04  5308    INTEREST

    id1     pay 2018-10-04  5367    INTEREST

    id1     pay 2019-01-04  5367    INTEREST

    id1     pay 2019-04-04  5250    INTEREST

    id1     pay 2019-07-04  5308    INTEREST

    id1     pay 2019-10-04  5367    INTEREST

    id1     pay 2020-01-06  5367    INTEREST

    id1     pay 2020-04-06  5308    INTEREST

    id1     pay 2020-07-06  5308    INTEREST

    id1     pay 2020-10-05  5367    INTEREST

    id1     pay 2021-01-04  5367    INTEREST

    id1     pay 2021-04-06  5250    INTEREST

    id1     pay 2021-07-05  5308    INTEREST

    id1     pay 2021-10-04  5367    INTEREST

    id1     pay 2021-10-04  315000  FINAL_EX

    id1     rec 2017-04-04  42961   INTEREST

    id1     rec 2017-07-04  43438   INTEREST

    id1     rec 2017-10-04  43915   INTEREST

    id1     rec 2018-01-04  43915   INTEREST

    id1     rec 2018-04-04  42961   INTEREST

    id1     rec 2018-07-04  43438   INTEREST

    id1     rec 2018-10-04  43915   INTEREST

    id1     rec 2019-01-04  43915   INTEREST

    id1     rec 2019-04-04  42961   INTEREST

    id1     rec 2019-07-04  43438   INTEREST

    id1     rec 2019-10-04  43915   INTEREST

    id1     rec 2020-01-07  43915   INTEREST

    id1     rec 2020-04-06  43438   INTEREST

    id1     rec 2020-07-06  43438   INTEREST

    id1     rec 2020-10-05  43915   INTEREST

    id1     rec 2021-01-04  43915   INTEREST

    id1     rec 2021-04-06  42961   INTEREST

    id1     rec 2021-07-05  43438   INTEREST

    id1     rec 2021-10-04  43915   INTEREST

    id1     rec 2021-10-04  2988563 FINAL_EX

    id2 …   …   …   … and so on...

r data-cleaning

asked May 24 '18 at 10:01

jacob

148110

asked May 24 '18 at 10:01

jacob

148110

asked May 24 '18 at 10:01

jacob

148110

asked May 24 '18 at 10:01

jacob

148110

asked May 24 '18 at 10:01

jacob

148110

bumped to the homepage by Community♦ 4 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 4 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57

add a comment |

$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57

Joining on the primary keys?

– Aditya
May 24 '18 at 11:57

add a comment |

1 Answer
1

active

oldest

votes

## read file 

# create path 

file4 <- "GetSwapFlowdb2.xlsm"

filepath3 <- paste(folder1, file4, sep="")

# read. sheet 2 is recieve leg, and 3 is pay leg

db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()

db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame() 



## save swap IDs 

(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()

transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()

stopifnot(transid_list_pay == transid_list)

rm(transid_list_pay)

(nr_id <- length(transid_list))



## make function that cleans a single swap

db_clean_oneswap <- function(aswap){

  # input...

  # aswap: a single swap in the form fo data.frame

  # output...

  # aswpap: the cleaned version



  # format cf 

  aswap$cf <- as.numeric(aswap$cf)

  # format dates from excel format to yyyy-mm-dd

  aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))

  aswap$date <- as.Date(aswap$date)

  # create t 

  aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")

  # save the swap to our list 

  return(aswap)

}





## make function so we can save swap cashflows to a list 

db_create_swaplist <- function(data, transids){

  # input... 

  # data: the dataframe which comes from excel 

  # transids: a list character vector of swap trans IDs

  # output... 

  # a list with each element containing a dataframe of the swap 



  # create empty list

  swap_list <- NULL



  # for loop to create a list containing each swap. 

  for (id in transids){

    # create df we will "eat" from 

    data <- db_rec

    # select three cols, save to a swap

    one_swap <- data[-1, c(1,2,3)]

    # remove those cols we saved 

    data <- data[-c(1,2,3)]

    # rename cols 

    names(one_swap) <- c("date", "cf", "type")

    # remove a row, remove na 

    one_swap <- one_swap[-1, ]

    one_swap <- na.omit(one_swap)



    # clean it

    one_swap <- db_clean_oneswap(one_swap)

    # save the swap to our list 

    swap_list[[id]] <- one_swap

  }

  return(swap_list)

}



## create a list for rec and pay

swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)

swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)



## ways of accessing the lists

swaps_rec[[1]]

transid_list[1]

swaps_rec[[ transid_list[2] ]]

swaps_rec[[as.character(transid)]]

answered May 24 '18 at 13:32

jacob

148110

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f32099%2fconvert-from-many-sub-tables-to-a-single-tidy-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

## read file 

# create path 

file4 <- "GetSwapFlowdb2.xlsm"

filepath3 <- paste(folder1, file4, sep="")

# read. sheet 2 is recieve leg, and 3 is pay leg

db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()

db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame() 



## save swap IDs 

(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()

transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()

stopifnot(transid_list_pay == transid_list)

rm(transid_list_pay)

(nr_id <- length(transid_list))



## make function that cleans a single swap

db_clean_oneswap <- function(aswap){

  # input...

  # aswap: a single swap in the form fo data.frame

  # output...

  # aswpap: the cleaned version



  # format cf 

  aswap$cf <- as.numeric(aswap$cf)

  # format dates from excel format to yyyy-mm-dd

  aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))

  aswap$date <- as.Date(aswap$date)

  # create t 

  aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")

  # save the swap to our list 

  return(aswap)

}





## make function so we can save swap cashflows to a list 

db_create_swaplist <- function(data, transids){

  # input... 

  # data: the dataframe which comes from excel 

  # transids: a list character vector of swap trans IDs

  # output... 

  # a list with each element containing a dataframe of the swap 



  # create empty list

  swap_list <- NULL



  # for loop to create a list containing each swap. 

  for (id in transids){

    # create df we will "eat" from 

    data <- db_rec

    # select three cols, save to a swap

    one_swap <- data[-1, c(1,2,3)]

    # remove those cols we saved 

    data <- data[-c(1,2,3)]

    # rename cols 

    names(one_swap) <- c("date", "cf", "type")

    # remove a row, remove na 

    one_swap <- one_swap[-1, ]

    one_swap <- na.omit(one_swap)



    # clean it

    one_swap <- db_clean_oneswap(one_swap)

    # save the swap to our list 

    swap_list[[id]] <- one_swap

  }

  return(swap_list)

}



## create a list for rec and pay

swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)

swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)



## ways of accessing the lists

swaps_rec[[1]]

transid_list[1]

swaps_rec[[ transid_list[2] ]]

swaps_rec[[as.character(transid)]]

answered May 24 '18 at 13:32

jacob

148110

add a comment |

## read file 

# create path 

file4 <- "GetSwapFlowdb2.xlsm"

filepath3 <- paste(folder1, file4, sep="")

# read. sheet 2 is recieve leg, and 3 is pay leg

db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()

db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame() 



## save swap IDs 

(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()

transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()

stopifnot(transid_list_pay == transid_list)

rm(transid_list_pay)

(nr_id <- length(transid_list))



## make function that cleans a single swap

db_clean_oneswap <- function(aswap){

  # input...

  # aswap: a single swap in the form fo data.frame

  # output...

  # aswpap: the cleaned version



  # format cf 

  aswap$cf <- as.numeric(aswap$cf)

  # format dates from excel format to yyyy-mm-dd

  aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))

  aswap$date <- as.Date(aswap$date)

  # create t 

  aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")

  # save the swap to our list 

  return(aswap)

}





## make function so we can save swap cashflows to a list 

db_create_swaplist <- function(data, transids){

  # input... 

  # data: the dataframe which comes from excel 

  # transids: a list character vector of swap trans IDs

  # output... 

  # a list with each element containing a dataframe of the swap 



  # create empty list

  swap_list <- NULL



  # for loop to create a list containing each swap. 

  for (id in transids){

    # create df we will "eat" from 

    data <- db_rec

    # select three cols, save to a swap

    one_swap <- data[-1, c(1,2,3)]

    # remove those cols we saved 

    data <- data[-c(1,2,3)]

    # rename cols 

    names(one_swap) <- c("date", "cf", "type")

    # remove a row, remove na 

    one_swap <- one_swap[-1, ]

    one_swap <- na.omit(one_swap)



    # clean it

    one_swap <- db_clean_oneswap(one_swap)

    # save the swap to our list 

    swap_list[[id]] <- one_swap

  }

  return(swap_list)

}



## create a list for rec and pay

swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)

swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)



## ways of accessing the lists

swaps_rec[[1]]

transid_list[1]

swaps_rec[[ transid_list[2] ]]

swaps_rec[[as.character(transid)]]

answered May 24 '18 at 13:32

jacob

148110

add a comment |

## read file 

# create path 

file4 <- "GetSwapFlowdb2.xlsm"

filepath3 <- paste(folder1, file4, sep="")

# read. sheet 2 is recieve leg, and 3 is pay leg

db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()

db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame() 



## save swap IDs 

(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()

transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()

stopifnot(transid_list_pay == transid_list)

rm(transid_list_pay)

(nr_id <- length(transid_list))



## make function that cleans a single swap

db_clean_oneswap <- function(aswap){

  # input...

  # aswap: a single swap in the form fo data.frame

  # output...

  # aswpap: the cleaned version



  # format cf 

  aswap$cf <- as.numeric(aswap$cf)

  # format dates from excel format to yyyy-mm-dd

  aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))

  aswap$date <- as.Date(aswap$date)

  # create t 

  aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")

  # save the swap to our list 

  return(aswap)

}





## make function so we can save swap cashflows to a list 

db_create_swaplist <- function(data, transids){

  # input... 

  # data: the dataframe which comes from excel 

  # transids: a list character vector of swap trans IDs

  # output... 

  # a list with each element containing a dataframe of the swap 



  # create empty list

  swap_list <- NULL



  # for loop to create a list containing each swap. 

  for (id in transids){

    # create df we will "eat" from 

    data <- db_rec

    # select three cols, save to a swap

    one_swap <- data[-1, c(1,2,3)]

    # remove those cols we saved 

    data <- data[-c(1,2,3)]

    # rename cols 

    names(one_swap) <- c("date", "cf", "type")

    # remove a row, remove na 

    one_swap <- one_swap[-1, ]

    one_swap <- na.omit(one_swap)



    # clean it

    one_swap <- db_clean_oneswap(one_swap)

    # save the swap to our list 

    swap_list[[id]] <- one_swap

  }

  return(swap_list)

}



## create a list for rec and pay

swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)

swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)



## ways of accessing the lists

swaps_rec[[1]]

transid_list[1]

swaps_rec[[ transid_list[2] ]]

swaps_rec[[as.character(transid)]]

answered May 24 '18 at 13:32

jacob

148110

## read file 

# create path 

file4 <- "GetSwapFlowdb2.xlsm"

filepath3 <- paste(folder1, file4, sep="")

# read. sheet 2 is recieve leg, and 3 is pay leg

db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()

db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame() 



## save swap IDs 

(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()

transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()

stopifnot(transid_list_pay == transid_list)

rm(transid_list_pay)

(nr_id <- length(transid_list))



## make function that cleans a single swap

db_clean_oneswap <- function(aswap){

  # input...

  # aswap: a single swap in the form fo data.frame

  # output...

  # aswpap: the cleaned version



  # format cf 

  aswap$cf <- as.numeric(aswap$cf)

  # format dates from excel format to yyyy-mm-dd

  aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))

  aswap$date <- as.Date(aswap$date)

  # create t 

  aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")

  # save the swap to our list 

  return(aswap)

}





## make function so we can save swap cashflows to a list 

db_create_swaplist <- function(data, transids){

  # input... 

  # data: the dataframe which comes from excel 

  # transids: a list character vector of swap trans IDs

  # output... 

  # a list with each element containing a dataframe of the swap 



  # create empty list

  swap_list <- NULL



  # for loop to create a list containing each swap. 

  for (id in transids){

    # create df we will "eat" from 

    data <- db_rec

    # select three cols, save to a swap

    one_swap <- data[-1, c(1,2,3)]

    # remove those cols we saved 

    data <- data[-c(1,2,3)]

    # rename cols 

    names(one_swap) <- c("date", "cf", "type")

    # remove a row, remove na 

    one_swap <- one_swap[-1, ]

    one_swap <- na.omit(one_swap)



    # clean it

    one_swap <- db_clean_oneswap(one_swap)

    # save the swap to our list 

    swap_list[[id]] <- one_swap

  }

  return(swap_list)

}



## create a list for rec and pay

swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)

swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)



## ways of accessing the lists

swaps_rec[[1]]

transid_list[1]

swaps_rec[[ transid_list[2] ]]

swaps_rec[[as.character(transid)]]

answered May 24 '18 at 13:32

jacob

148110

answered May 24 '18 at 13:32

jacob

148110

answered May 24 '18 at 13:32

jacob

148110

answered May 24 '18 at 13:32

jacob

148110

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

6nRmc2,3hiDrQ,l20Y01VCAKjj,pb1,4 CK

搜尋此網誌

Htydjtk