Convert from many sub-tables to a single tidy dataframe












2












$begingroup$


I have data in a bad format that I want to make tidy using an R script. I lack the skills to convert it into the format I want, so I look for an answer that (a) outlines a method, or (b) hands me the entire script.



The definition of tidy here is that the output has the following columns:





  • swap_id a unique identifier.


  • leg either pay or rec.


  • cashflow the amount that is recieved


  • date the date the cashflow arrives


  • type whether the cashflow is of INTEREST of FINAL_EX.


The variable leg takes on two different values: pay and rec. Currently leg=pay is in one excel sheet and leg=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.



I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w



Here is a sample of the content of sheet "rec":



    id1                                 id2         
date cf type date cf type
2017-04-04 42961 INTEREST 2015-04-07 33953 INTEREST
2017-07-04 43438 INTEREST 2016-04-04 203161 INTEREST
2017-10-04 43915 INTEREST 2017-04-04 203161 INTEREST
2018-01-04 43915 INTEREST 2018-04-04 203161 INTEREST
2018-04-04 42961 INTEREST 2019-04-04 203161 INTEREST
2018-07-04 43438 INTEREST 2020-04-06 203161 INTEREST
2018-10-04 43915 INTEREST 2021-04-06 203161 INTEREST
2019-01-04 43915 INTEREST 2022-04-04 203161 INTEREST
2019-04-04 42961 INTEREST 2023-04-04 203161 INTEREST
2019-07-04 43438 INTEREST 2023-04-04 5016330 FINAL_EX
2019-10-04 43915 INTEREST
2020-01-07 43915 INTEREST
2020-04-06 43438 INTEREST
2020-07-06 43438 INTEREST
2020-10-05 43915 INTEREST
2021-01-04 43915 INTEREST
2021-04-06 42961 INTEREST
2021-07-05 43438 INTEREST
2021-10-04 43915 INTEREST
2021-10-04 2988563 FINAL_EX


Here is a sample of the content of sheet "pay":



    id1                                 id2     
date cf type date cf type
2017-04-04 5250 INTEREST 2015-04-07 30938 INTEREST
2017-07-04 5308 INTEREST 2016-04-04 30938 INTEREST
2017-10-04 5367 INTEREST 2017-04-04 30938 INTEREST
2018-01-04 5367 INTEREST 2018-04-04 30938 INTEREST
2018-04-04 5250 INTEREST 2019-04-04 30938 INTEREST
2018-07-04 5308 INTEREST 2020-04-06 30938 INTEREST
2018-10-04 5367 INTEREST 2021-04-06 30938 INTEREST
2019-01-04 5367 INTEREST 2022-04-04 30938 INTEREST
2019-04-04 5250 INTEREST 2023-04-04 30938 INTEREST
2019-07-04 5308 INTEREST 2023-04-04 540000 FINAL_EX
2019-10-04 5367 INTEREST
2020-01-06 5367 INTEREST
2020-04-06 5308 INTEREST
2020-07-06 5308 INTEREST
2020-10-05 5367 INTEREST
2021-01-04 5367 INTEREST
2021-04-06 5250 INTEREST
2021-07-05 5308 INTEREST
2021-10-04 5367 INTEREST
2021-10-04 315000 FINAL_EX


In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.



Here is how I want the output to be structured after the transformation.



    swap_id leg date        cf      type
id1 pay 2017-04-04 5250 INTEREST
id1 pay 2017-07-04 5308 INTEREST
id1 pay 2017-10-04 5367 INTEREST
id1 pay 2018-01-04 5367 INTEREST
id1 pay 2018-04-04 5250 INTEREST
id1 pay 2018-07-04 5308 INTEREST
id1 pay 2018-10-04 5367 INTEREST
id1 pay 2019-01-04 5367 INTEREST
id1 pay 2019-04-04 5250 INTEREST
id1 pay 2019-07-04 5308 INTEREST
id1 pay 2019-10-04 5367 INTEREST
id1 pay 2020-01-06 5367 INTEREST
id1 pay 2020-04-06 5308 INTEREST
id1 pay 2020-07-06 5308 INTEREST
id1 pay 2020-10-05 5367 INTEREST
id1 pay 2021-01-04 5367 INTEREST
id1 pay 2021-04-06 5250 INTEREST
id1 pay 2021-07-05 5308 INTEREST
id1 pay 2021-10-04 5367 INTEREST
id1 pay 2021-10-04 315000 FINAL_EX
id1 rec 2017-04-04 42961 INTEREST
id1 rec 2017-07-04 43438 INTEREST
id1 rec 2017-10-04 43915 INTEREST
id1 rec 2018-01-04 43915 INTEREST
id1 rec 2018-04-04 42961 INTEREST
id1 rec 2018-07-04 43438 INTEREST
id1 rec 2018-10-04 43915 INTEREST
id1 rec 2019-01-04 43915 INTEREST
id1 rec 2019-04-04 42961 INTEREST
id1 rec 2019-07-04 43438 INTEREST
id1 rec 2019-10-04 43915 INTEREST
id1 rec 2020-01-07 43915 INTEREST
id1 rec 2020-04-06 43438 INTEREST
id1 rec 2020-07-06 43438 INTEREST
id1 rec 2020-10-05 43915 INTEREST
id1 rec 2021-01-04 43915 INTEREST
id1 rec 2021-04-06 42961 INTEREST
id1 rec 2021-07-05 43438 INTEREST
id1 rec 2021-10-04 43915 INTEREST
id1 rec 2021-10-04 2988563 FINAL_EX
id2 … … … … and so on...









share|improve this question









$endgroup$




bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • $begingroup$
    Joining on the primary keys?
    $endgroup$
    – Aditya
    May 24 '18 at 11:57
















2












$begingroup$


I have data in a bad format that I want to make tidy using an R script. I lack the skills to convert it into the format I want, so I look for an answer that (a) outlines a method, or (b) hands me the entire script.



The definition of tidy here is that the output has the following columns:





  • swap_id a unique identifier.


  • leg either pay or rec.


  • cashflow the amount that is recieved


  • date the date the cashflow arrives


  • type whether the cashflow is of INTEREST of FINAL_EX.


The variable leg takes on two different values: pay and rec. Currently leg=pay is in one excel sheet and leg=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.



I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w



Here is a sample of the content of sheet "rec":



    id1                                 id2         
date cf type date cf type
2017-04-04 42961 INTEREST 2015-04-07 33953 INTEREST
2017-07-04 43438 INTEREST 2016-04-04 203161 INTEREST
2017-10-04 43915 INTEREST 2017-04-04 203161 INTEREST
2018-01-04 43915 INTEREST 2018-04-04 203161 INTEREST
2018-04-04 42961 INTEREST 2019-04-04 203161 INTEREST
2018-07-04 43438 INTEREST 2020-04-06 203161 INTEREST
2018-10-04 43915 INTEREST 2021-04-06 203161 INTEREST
2019-01-04 43915 INTEREST 2022-04-04 203161 INTEREST
2019-04-04 42961 INTEREST 2023-04-04 203161 INTEREST
2019-07-04 43438 INTEREST 2023-04-04 5016330 FINAL_EX
2019-10-04 43915 INTEREST
2020-01-07 43915 INTEREST
2020-04-06 43438 INTEREST
2020-07-06 43438 INTEREST
2020-10-05 43915 INTEREST
2021-01-04 43915 INTEREST
2021-04-06 42961 INTEREST
2021-07-05 43438 INTEREST
2021-10-04 43915 INTEREST
2021-10-04 2988563 FINAL_EX


Here is a sample of the content of sheet "pay":



    id1                                 id2     
date cf type date cf type
2017-04-04 5250 INTEREST 2015-04-07 30938 INTEREST
2017-07-04 5308 INTEREST 2016-04-04 30938 INTEREST
2017-10-04 5367 INTEREST 2017-04-04 30938 INTEREST
2018-01-04 5367 INTEREST 2018-04-04 30938 INTEREST
2018-04-04 5250 INTEREST 2019-04-04 30938 INTEREST
2018-07-04 5308 INTEREST 2020-04-06 30938 INTEREST
2018-10-04 5367 INTEREST 2021-04-06 30938 INTEREST
2019-01-04 5367 INTEREST 2022-04-04 30938 INTEREST
2019-04-04 5250 INTEREST 2023-04-04 30938 INTEREST
2019-07-04 5308 INTEREST 2023-04-04 540000 FINAL_EX
2019-10-04 5367 INTEREST
2020-01-06 5367 INTEREST
2020-04-06 5308 INTEREST
2020-07-06 5308 INTEREST
2020-10-05 5367 INTEREST
2021-01-04 5367 INTEREST
2021-04-06 5250 INTEREST
2021-07-05 5308 INTEREST
2021-10-04 5367 INTEREST
2021-10-04 315000 FINAL_EX


In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.



Here is how I want the output to be structured after the transformation.



    swap_id leg date        cf      type
id1 pay 2017-04-04 5250 INTEREST
id1 pay 2017-07-04 5308 INTEREST
id1 pay 2017-10-04 5367 INTEREST
id1 pay 2018-01-04 5367 INTEREST
id1 pay 2018-04-04 5250 INTEREST
id1 pay 2018-07-04 5308 INTEREST
id1 pay 2018-10-04 5367 INTEREST
id1 pay 2019-01-04 5367 INTEREST
id1 pay 2019-04-04 5250 INTEREST
id1 pay 2019-07-04 5308 INTEREST
id1 pay 2019-10-04 5367 INTEREST
id1 pay 2020-01-06 5367 INTEREST
id1 pay 2020-04-06 5308 INTEREST
id1 pay 2020-07-06 5308 INTEREST
id1 pay 2020-10-05 5367 INTEREST
id1 pay 2021-01-04 5367 INTEREST
id1 pay 2021-04-06 5250 INTEREST
id1 pay 2021-07-05 5308 INTEREST
id1 pay 2021-10-04 5367 INTEREST
id1 pay 2021-10-04 315000 FINAL_EX
id1 rec 2017-04-04 42961 INTEREST
id1 rec 2017-07-04 43438 INTEREST
id1 rec 2017-10-04 43915 INTEREST
id1 rec 2018-01-04 43915 INTEREST
id1 rec 2018-04-04 42961 INTEREST
id1 rec 2018-07-04 43438 INTEREST
id1 rec 2018-10-04 43915 INTEREST
id1 rec 2019-01-04 43915 INTEREST
id1 rec 2019-04-04 42961 INTEREST
id1 rec 2019-07-04 43438 INTEREST
id1 rec 2019-10-04 43915 INTEREST
id1 rec 2020-01-07 43915 INTEREST
id1 rec 2020-04-06 43438 INTEREST
id1 rec 2020-07-06 43438 INTEREST
id1 rec 2020-10-05 43915 INTEREST
id1 rec 2021-01-04 43915 INTEREST
id1 rec 2021-04-06 42961 INTEREST
id1 rec 2021-07-05 43438 INTEREST
id1 rec 2021-10-04 43915 INTEREST
id1 rec 2021-10-04 2988563 FINAL_EX
id2 … … … … and so on...









share|improve this question









$endgroup$




bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • $begingroup$
    Joining on the primary keys?
    $endgroup$
    – Aditya
    May 24 '18 at 11:57














2












2








2





$begingroup$


I have data in a bad format that I want to make tidy using an R script. I lack the skills to convert it into the format I want, so I look for an answer that (a) outlines a method, or (b) hands me the entire script.



The definition of tidy here is that the output has the following columns:





  • swap_id a unique identifier.


  • leg either pay or rec.


  • cashflow the amount that is recieved


  • date the date the cashflow arrives


  • type whether the cashflow is of INTEREST of FINAL_EX.


The variable leg takes on two different values: pay and rec. Currently leg=pay is in one excel sheet and leg=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.



I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w



Here is a sample of the content of sheet "rec":



    id1                                 id2         
date cf type date cf type
2017-04-04 42961 INTEREST 2015-04-07 33953 INTEREST
2017-07-04 43438 INTEREST 2016-04-04 203161 INTEREST
2017-10-04 43915 INTEREST 2017-04-04 203161 INTEREST
2018-01-04 43915 INTEREST 2018-04-04 203161 INTEREST
2018-04-04 42961 INTEREST 2019-04-04 203161 INTEREST
2018-07-04 43438 INTEREST 2020-04-06 203161 INTEREST
2018-10-04 43915 INTEREST 2021-04-06 203161 INTEREST
2019-01-04 43915 INTEREST 2022-04-04 203161 INTEREST
2019-04-04 42961 INTEREST 2023-04-04 203161 INTEREST
2019-07-04 43438 INTEREST 2023-04-04 5016330 FINAL_EX
2019-10-04 43915 INTEREST
2020-01-07 43915 INTEREST
2020-04-06 43438 INTEREST
2020-07-06 43438 INTEREST
2020-10-05 43915 INTEREST
2021-01-04 43915 INTEREST
2021-04-06 42961 INTEREST
2021-07-05 43438 INTEREST
2021-10-04 43915 INTEREST
2021-10-04 2988563 FINAL_EX


Here is a sample of the content of sheet "pay":



    id1                                 id2     
date cf type date cf type
2017-04-04 5250 INTEREST 2015-04-07 30938 INTEREST
2017-07-04 5308 INTEREST 2016-04-04 30938 INTEREST
2017-10-04 5367 INTEREST 2017-04-04 30938 INTEREST
2018-01-04 5367 INTEREST 2018-04-04 30938 INTEREST
2018-04-04 5250 INTEREST 2019-04-04 30938 INTEREST
2018-07-04 5308 INTEREST 2020-04-06 30938 INTEREST
2018-10-04 5367 INTEREST 2021-04-06 30938 INTEREST
2019-01-04 5367 INTEREST 2022-04-04 30938 INTEREST
2019-04-04 5250 INTEREST 2023-04-04 30938 INTEREST
2019-07-04 5308 INTEREST 2023-04-04 540000 FINAL_EX
2019-10-04 5367 INTEREST
2020-01-06 5367 INTEREST
2020-04-06 5308 INTEREST
2020-07-06 5308 INTEREST
2020-10-05 5367 INTEREST
2021-01-04 5367 INTEREST
2021-04-06 5250 INTEREST
2021-07-05 5308 INTEREST
2021-10-04 5367 INTEREST
2021-10-04 315000 FINAL_EX


In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.



Here is how I want the output to be structured after the transformation.



    swap_id leg date        cf      type
id1 pay 2017-04-04 5250 INTEREST
id1 pay 2017-07-04 5308 INTEREST
id1 pay 2017-10-04 5367 INTEREST
id1 pay 2018-01-04 5367 INTEREST
id1 pay 2018-04-04 5250 INTEREST
id1 pay 2018-07-04 5308 INTEREST
id1 pay 2018-10-04 5367 INTEREST
id1 pay 2019-01-04 5367 INTEREST
id1 pay 2019-04-04 5250 INTEREST
id1 pay 2019-07-04 5308 INTEREST
id1 pay 2019-10-04 5367 INTEREST
id1 pay 2020-01-06 5367 INTEREST
id1 pay 2020-04-06 5308 INTEREST
id1 pay 2020-07-06 5308 INTEREST
id1 pay 2020-10-05 5367 INTEREST
id1 pay 2021-01-04 5367 INTEREST
id1 pay 2021-04-06 5250 INTEREST
id1 pay 2021-07-05 5308 INTEREST
id1 pay 2021-10-04 5367 INTEREST
id1 pay 2021-10-04 315000 FINAL_EX
id1 rec 2017-04-04 42961 INTEREST
id1 rec 2017-07-04 43438 INTEREST
id1 rec 2017-10-04 43915 INTEREST
id1 rec 2018-01-04 43915 INTEREST
id1 rec 2018-04-04 42961 INTEREST
id1 rec 2018-07-04 43438 INTEREST
id1 rec 2018-10-04 43915 INTEREST
id1 rec 2019-01-04 43915 INTEREST
id1 rec 2019-04-04 42961 INTEREST
id1 rec 2019-07-04 43438 INTEREST
id1 rec 2019-10-04 43915 INTEREST
id1 rec 2020-01-07 43915 INTEREST
id1 rec 2020-04-06 43438 INTEREST
id1 rec 2020-07-06 43438 INTEREST
id1 rec 2020-10-05 43915 INTEREST
id1 rec 2021-01-04 43915 INTEREST
id1 rec 2021-04-06 42961 INTEREST
id1 rec 2021-07-05 43438 INTEREST
id1 rec 2021-10-04 43915 INTEREST
id1 rec 2021-10-04 2988563 FINAL_EX
id2 … … … … and so on...









share|improve this question









$endgroup$




I have data in a bad format that I want to make tidy using an R script. I lack the skills to convert it into the format I want, so I look for an answer that (a) outlines a method, or (b) hands me the entire script.



The definition of tidy here is that the output has the following columns:





  • swap_id a unique identifier.


  • leg either pay or rec.


  • cashflow the amount that is recieved


  • date the date the cashflow arrives


  • type whether the cashflow is of INTEREST of FINAL_EX.


The variable leg takes on two different values: pay and rec. Currently leg=pay is in one excel sheet and leg=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.



I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w



Here is a sample of the content of sheet "rec":



    id1                                 id2         
date cf type date cf type
2017-04-04 42961 INTEREST 2015-04-07 33953 INTEREST
2017-07-04 43438 INTEREST 2016-04-04 203161 INTEREST
2017-10-04 43915 INTEREST 2017-04-04 203161 INTEREST
2018-01-04 43915 INTEREST 2018-04-04 203161 INTEREST
2018-04-04 42961 INTEREST 2019-04-04 203161 INTEREST
2018-07-04 43438 INTEREST 2020-04-06 203161 INTEREST
2018-10-04 43915 INTEREST 2021-04-06 203161 INTEREST
2019-01-04 43915 INTEREST 2022-04-04 203161 INTEREST
2019-04-04 42961 INTEREST 2023-04-04 203161 INTEREST
2019-07-04 43438 INTEREST 2023-04-04 5016330 FINAL_EX
2019-10-04 43915 INTEREST
2020-01-07 43915 INTEREST
2020-04-06 43438 INTEREST
2020-07-06 43438 INTEREST
2020-10-05 43915 INTEREST
2021-01-04 43915 INTEREST
2021-04-06 42961 INTEREST
2021-07-05 43438 INTEREST
2021-10-04 43915 INTEREST
2021-10-04 2988563 FINAL_EX


Here is a sample of the content of sheet "pay":



    id1                                 id2     
date cf type date cf type
2017-04-04 5250 INTEREST 2015-04-07 30938 INTEREST
2017-07-04 5308 INTEREST 2016-04-04 30938 INTEREST
2017-10-04 5367 INTEREST 2017-04-04 30938 INTEREST
2018-01-04 5367 INTEREST 2018-04-04 30938 INTEREST
2018-04-04 5250 INTEREST 2019-04-04 30938 INTEREST
2018-07-04 5308 INTEREST 2020-04-06 30938 INTEREST
2018-10-04 5367 INTEREST 2021-04-06 30938 INTEREST
2019-01-04 5367 INTEREST 2022-04-04 30938 INTEREST
2019-04-04 5250 INTEREST 2023-04-04 30938 INTEREST
2019-07-04 5308 INTEREST 2023-04-04 540000 FINAL_EX
2019-10-04 5367 INTEREST
2020-01-06 5367 INTEREST
2020-04-06 5308 INTEREST
2020-07-06 5308 INTEREST
2020-10-05 5367 INTEREST
2021-01-04 5367 INTEREST
2021-04-06 5250 INTEREST
2021-07-05 5308 INTEREST
2021-10-04 5367 INTEREST
2021-10-04 315000 FINAL_EX


In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.



Here is how I want the output to be structured after the transformation.



    swap_id leg date        cf      type
id1 pay 2017-04-04 5250 INTEREST
id1 pay 2017-07-04 5308 INTEREST
id1 pay 2017-10-04 5367 INTEREST
id1 pay 2018-01-04 5367 INTEREST
id1 pay 2018-04-04 5250 INTEREST
id1 pay 2018-07-04 5308 INTEREST
id1 pay 2018-10-04 5367 INTEREST
id1 pay 2019-01-04 5367 INTEREST
id1 pay 2019-04-04 5250 INTEREST
id1 pay 2019-07-04 5308 INTEREST
id1 pay 2019-10-04 5367 INTEREST
id1 pay 2020-01-06 5367 INTEREST
id1 pay 2020-04-06 5308 INTEREST
id1 pay 2020-07-06 5308 INTEREST
id1 pay 2020-10-05 5367 INTEREST
id1 pay 2021-01-04 5367 INTEREST
id1 pay 2021-04-06 5250 INTEREST
id1 pay 2021-07-05 5308 INTEREST
id1 pay 2021-10-04 5367 INTEREST
id1 pay 2021-10-04 315000 FINAL_EX
id1 rec 2017-04-04 42961 INTEREST
id1 rec 2017-07-04 43438 INTEREST
id1 rec 2017-10-04 43915 INTEREST
id1 rec 2018-01-04 43915 INTEREST
id1 rec 2018-04-04 42961 INTEREST
id1 rec 2018-07-04 43438 INTEREST
id1 rec 2018-10-04 43915 INTEREST
id1 rec 2019-01-04 43915 INTEREST
id1 rec 2019-04-04 42961 INTEREST
id1 rec 2019-07-04 43438 INTEREST
id1 rec 2019-10-04 43915 INTEREST
id1 rec 2020-01-07 43915 INTEREST
id1 rec 2020-04-06 43438 INTEREST
id1 rec 2020-07-06 43438 INTEREST
id1 rec 2020-10-05 43915 INTEREST
id1 rec 2021-01-04 43915 INTEREST
id1 rec 2021-04-06 42961 INTEREST
id1 rec 2021-07-05 43438 INTEREST
id1 rec 2021-10-04 43915 INTEREST
id1 rec 2021-10-04 2988563 FINAL_EX
id2 … … … … and so on...






r data-cleaning






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked May 24 '18 at 10:01









jacobjacob

148110




148110





bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.














  • $begingroup$
    Joining on the primary keys?
    $endgroup$
    – Aditya
    May 24 '18 at 11:57


















  • $begingroup$
    Joining on the primary keys?
    $endgroup$
    – Aditya
    May 24 '18 at 11:57
















$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57




$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57










1 Answer
1






active

oldest

votes


















0












$begingroup$

## read file 
# create path
file4 <- "GetSwapFlowdb2.xlsm"
filepath3 <- paste(folder1, file4, sep="")
# read. sheet 2 is recieve leg, and 3 is pay leg
db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()
db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame()

## save swap IDs
(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()
transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()
stopifnot(transid_list_pay == transid_list)
rm(transid_list_pay)
(nr_id <- length(transid_list))

## make function that cleans a single swap
db_clean_oneswap <- function(aswap){
# input...
# aswap: a single swap in the form fo data.frame
# output...
# aswpap: the cleaned version

# format cf
aswap$cf <- as.numeric(aswap$cf)
# format dates from excel format to yyyy-mm-dd
aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))
aswap$date <- as.Date(aswap$date)
# create t
aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")
# save the swap to our list
return(aswap)
}


## make function so we can save swap cashflows to a list
db_create_swaplist <- function(data, transids){
# input...
# data: the dataframe which comes from excel
# transids: a list character vector of swap trans IDs
# output...
# a list with each element containing a dataframe of the swap

# create empty list
swap_list <- NULL

# for loop to create a list containing each swap.
for (id in transids){
# create df we will "eat" from
data <- db_rec
# select three cols, save to a swap
one_swap <- data[-1, c(1,2,3)]
# remove those cols we saved
data <- data[-c(1,2,3)]
# rename cols
names(one_swap) <- c("date", "cf", "type")
# remove a row, remove na
one_swap <- one_swap[-1, ]
one_swap <- na.omit(one_swap)

# clean it
one_swap <- db_clean_oneswap(one_swap)
# save the swap to our list
swap_list[[id]] <- one_swap
}
return(swap_list)
}

## create a list for rec and pay
swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)
swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)

## ways of accessing the lists
swaps_rec[[1]]
transid_list[1]
swaps_rec[[ transid_list[2] ]]
swaps_rec[[as.character(transid)]]





share|improve this answer









$endgroup$














    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f32099%2fconvert-from-many-sub-tables-to-a-single-tidy-dataframe%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    ## read file 
    # create path
    file4 <- "GetSwapFlowdb2.xlsm"
    filepath3 <- paste(folder1, file4, sep="")
    # read. sheet 2 is recieve leg, and 3 is pay leg
    db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()
    db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame()

    ## save swap IDs
    (transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()
    transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()
    stopifnot(transid_list_pay == transid_list)
    rm(transid_list_pay)
    (nr_id <- length(transid_list))

    ## make function that cleans a single swap
    db_clean_oneswap <- function(aswap){
    # input...
    # aswap: a single swap in the form fo data.frame
    # output...
    # aswpap: the cleaned version

    # format cf
    aswap$cf <- as.numeric(aswap$cf)
    # format dates from excel format to yyyy-mm-dd
    aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))
    aswap$date <- as.Date(aswap$date)
    # create t
    aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")
    # save the swap to our list
    return(aswap)
    }


    ## make function so we can save swap cashflows to a list
    db_create_swaplist <- function(data, transids){
    # input...
    # data: the dataframe which comes from excel
    # transids: a list character vector of swap trans IDs
    # output...
    # a list with each element containing a dataframe of the swap

    # create empty list
    swap_list <- NULL

    # for loop to create a list containing each swap.
    for (id in transids){
    # create df we will "eat" from
    data <- db_rec
    # select three cols, save to a swap
    one_swap <- data[-1, c(1,2,3)]
    # remove those cols we saved
    data <- data[-c(1,2,3)]
    # rename cols
    names(one_swap) <- c("date", "cf", "type")
    # remove a row, remove na
    one_swap <- one_swap[-1, ]
    one_swap <- na.omit(one_swap)

    # clean it
    one_swap <- db_clean_oneswap(one_swap)
    # save the swap to our list
    swap_list[[id]] <- one_swap
    }
    return(swap_list)
    }

    ## create a list for rec and pay
    swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)
    swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)

    ## ways of accessing the lists
    swaps_rec[[1]]
    transid_list[1]
    swaps_rec[[ transid_list[2] ]]
    swaps_rec[[as.character(transid)]]





    share|improve this answer









    $endgroup$


















      0












      $begingroup$

      ## read file 
      # create path
      file4 <- "GetSwapFlowdb2.xlsm"
      filepath3 <- paste(folder1, file4, sep="")
      # read. sheet 2 is recieve leg, and 3 is pay leg
      db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()
      db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame()

      ## save swap IDs
      (transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()
      transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()
      stopifnot(transid_list_pay == transid_list)
      rm(transid_list_pay)
      (nr_id <- length(transid_list))

      ## make function that cleans a single swap
      db_clean_oneswap <- function(aswap){
      # input...
      # aswap: a single swap in the form fo data.frame
      # output...
      # aswpap: the cleaned version

      # format cf
      aswap$cf <- as.numeric(aswap$cf)
      # format dates from excel format to yyyy-mm-dd
      aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))
      aswap$date <- as.Date(aswap$date)
      # create t
      aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")
      # save the swap to our list
      return(aswap)
      }


      ## make function so we can save swap cashflows to a list
      db_create_swaplist <- function(data, transids){
      # input...
      # data: the dataframe which comes from excel
      # transids: a list character vector of swap trans IDs
      # output...
      # a list with each element containing a dataframe of the swap

      # create empty list
      swap_list <- NULL

      # for loop to create a list containing each swap.
      for (id in transids){
      # create df we will "eat" from
      data <- db_rec
      # select three cols, save to a swap
      one_swap <- data[-1, c(1,2,3)]
      # remove those cols we saved
      data <- data[-c(1,2,3)]
      # rename cols
      names(one_swap) <- c("date", "cf", "type")
      # remove a row, remove na
      one_swap <- one_swap[-1, ]
      one_swap <- na.omit(one_swap)

      # clean it
      one_swap <- db_clean_oneswap(one_swap)
      # save the swap to our list
      swap_list[[id]] <- one_swap
      }
      return(swap_list)
      }

      ## create a list for rec and pay
      swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)
      swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)

      ## ways of accessing the lists
      swaps_rec[[1]]
      transid_list[1]
      swaps_rec[[ transid_list[2] ]]
      swaps_rec[[as.character(transid)]]





      share|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        ## read file 
        # create path
        file4 <- "GetSwapFlowdb2.xlsm"
        filepath3 <- paste(folder1, file4, sep="")
        # read. sheet 2 is recieve leg, and 3 is pay leg
        db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()
        db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame()

        ## save swap IDs
        (transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()
        transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()
        stopifnot(transid_list_pay == transid_list)
        rm(transid_list_pay)
        (nr_id <- length(transid_list))

        ## make function that cleans a single swap
        db_clean_oneswap <- function(aswap){
        # input...
        # aswap: a single swap in the form fo data.frame
        # output...
        # aswpap: the cleaned version

        # format cf
        aswap$cf <- as.numeric(aswap$cf)
        # format dates from excel format to yyyy-mm-dd
        aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))
        aswap$date <- as.Date(aswap$date)
        # create t
        aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")
        # save the swap to our list
        return(aswap)
        }


        ## make function so we can save swap cashflows to a list
        db_create_swaplist <- function(data, transids){
        # input...
        # data: the dataframe which comes from excel
        # transids: a list character vector of swap trans IDs
        # output...
        # a list with each element containing a dataframe of the swap

        # create empty list
        swap_list <- NULL

        # for loop to create a list containing each swap.
        for (id in transids){
        # create df we will "eat" from
        data <- db_rec
        # select three cols, save to a swap
        one_swap <- data[-1, c(1,2,3)]
        # remove those cols we saved
        data <- data[-c(1,2,3)]
        # rename cols
        names(one_swap) <- c("date", "cf", "type")
        # remove a row, remove na
        one_swap <- one_swap[-1, ]
        one_swap <- na.omit(one_swap)

        # clean it
        one_swap <- db_clean_oneswap(one_swap)
        # save the swap to our list
        swap_list[[id]] <- one_swap
        }
        return(swap_list)
        }

        ## create a list for rec and pay
        swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)
        swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)

        ## ways of accessing the lists
        swaps_rec[[1]]
        transid_list[1]
        swaps_rec[[ transid_list[2] ]]
        swaps_rec[[as.character(transid)]]





        share|improve this answer









        $endgroup$



        ## read file 
        # create path
        file4 <- "GetSwapFlowdb2.xlsm"
        filepath3 <- paste(folder1, file4, sep="")
        # read. sheet 2 is recieve leg, and 3 is pay leg
        db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()
        db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame()

        ## save swap IDs
        (transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()
        transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()
        stopifnot(transid_list_pay == transid_list)
        rm(transid_list_pay)
        (nr_id <- length(transid_list))

        ## make function that cleans a single swap
        db_clean_oneswap <- function(aswap){
        # input...
        # aswap: a single swap in the form fo data.frame
        # output...
        # aswpap: the cleaned version

        # format cf
        aswap$cf <- as.numeric(aswap$cf)
        # format dates from excel format to yyyy-mm-dd
        aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))
        aswap$date <- as.Date(aswap$date)
        # create t
        aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")
        # save the swap to our list
        return(aswap)
        }


        ## make function so we can save swap cashflows to a list
        db_create_swaplist <- function(data, transids){
        # input...
        # data: the dataframe which comes from excel
        # transids: a list character vector of swap trans IDs
        # output...
        # a list with each element containing a dataframe of the swap

        # create empty list
        swap_list <- NULL

        # for loop to create a list containing each swap.
        for (id in transids){
        # create df we will "eat" from
        data <- db_rec
        # select three cols, save to a swap
        one_swap <- data[-1, c(1,2,3)]
        # remove those cols we saved
        data <- data[-c(1,2,3)]
        # rename cols
        names(one_swap) <- c("date", "cf", "type")
        # remove a row, remove na
        one_swap <- one_swap[-1, ]
        one_swap <- na.omit(one_swap)

        # clean it
        one_swap <- db_clean_oneswap(one_swap)
        # save the swap to our list
        swap_list[[id]] <- one_swap
        }
        return(swap_list)
        }

        ## create a list for rec and pay
        swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)
        swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)

        ## ways of accessing the lists
        swaps_rec[[1]]
        transid_list[1]
        swaps_rec[[ transid_list[2] ]]
        swaps_rec[[as.character(transid)]]






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered May 24 '18 at 13:32









        jacobjacob

        148110




        148110






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f32099%2fconvert-from-many-sub-tables-to-a-single-tidy-dataframe%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to label and detect the document text images

            Tabula Rosettana

            Aureus (color)