Convert from many sub-tables to a single tidy dataframe
$begingroup$
I have data in a bad format that I want to make tidy using an R script. I lack the skills to convert it into the format I want, so I look for an answer that (a) outlines a method, or (b) hands me the entire script.
The definition of tidy here is that the output has the following columns:
swap_id
a unique identifier.
leg
either pay or rec.
cashflow
the amount that is recieved
date
the date the cashflow arrives
type
whether the cashflow is of INTEREST of FINAL_EX.
The variable leg
takes on two different values: pay and rec. Currently leg
=pay is in one excel sheet and leg
=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.
I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w
Here is a sample of the content of sheet "rec":
id1 id2
date cf type date cf type
2017-04-04 42961 INTEREST 2015-04-07 33953 INTEREST
2017-07-04 43438 INTEREST 2016-04-04 203161 INTEREST
2017-10-04 43915 INTEREST 2017-04-04 203161 INTEREST
2018-01-04 43915 INTEREST 2018-04-04 203161 INTEREST
2018-04-04 42961 INTEREST 2019-04-04 203161 INTEREST
2018-07-04 43438 INTEREST 2020-04-06 203161 INTEREST
2018-10-04 43915 INTEREST 2021-04-06 203161 INTEREST
2019-01-04 43915 INTEREST 2022-04-04 203161 INTEREST
2019-04-04 42961 INTEREST 2023-04-04 203161 INTEREST
2019-07-04 43438 INTEREST 2023-04-04 5016330 FINAL_EX
2019-10-04 43915 INTEREST
2020-01-07 43915 INTEREST
2020-04-06 43438 INTEREST
2020-07-06 43438 INTEREST
2020-10-05 43915 INTEREST
2021-01-04 43915 INTEREST
2021-04-06 42961 INTEREST
2021-07-05 43438 INTEREST
2021-10-04 43915 INTEREST
2021-10-04 2988563 FINAL_EX
Here is a sample of the content of sheet "pay":
id1 id2
date cf type date cf type
2017-04-04 5250 INTEREST 2015-04-07 30938 INTEREST
2017-07-04 5308 INTEREST 2016-04-04 30938 INTEREST
2017-10-04 5367 INTEREST 2017-04-04 30938 INTEREST
2018-01-04 5367 INTEREST 2018-04-04 30938 INTEREST
2018-04-04 5250 INTEREST 2019-04-04 30938 INTEREST
2018-07-04 5308 INTEREST 2020-04-06 30938 INTEREST
2018-10-04 5367 INTEREST 2021-04-06 30938 INTEREST
2019-01-04 5367 INTEREST 2022-04-04 30938 INTEREST
2019-04-04 5250 INTEREST 2023-04-04 30938 INTEREST
2019-07-04 5308 INTEREST 2023-04-04 540000 FINAL_EX
2019-10-04 5367 INTEREST
2020-01-06 5367 INTEREST
2020-04-06 5308 INTEREST
2020-07-06 5308 INTEREST
2020-10-05 5367 INTEREST
2021-01-04 5367 INTEREST
2021-04-06 5250 INTEREST
2021-07-05 5308 INTEREST
2021-10-04 5367 INTEREST
2021-10-04 315000 FINAL_EX
In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.
Here is how I want the output to be structured after the transformation.
swap_id leg date cf type
id1 pay 2017-04-04 5250 INTEREST
id1 pay 2017-07-04 5308 INTEREST
id1 pay 2017-10-04 5367 INTEREST
id1 pay 2018-01-04 5367 INTEREST
id1 pay 2018-04-04 5250 INTEREST
id1 pay 2018-07-04 5308 INTEREST
id1 pay 2018-10-04 5367 INTEREST
id1 pay 2019-01-04 5367 INTEREST
id1 pay 2019-04-04 5250 INTEREST
id1 pay 2019-07-04 5308 INTEREST
id1 pay 2019-10-04 5367 INTEREST
id1 pay 2020-01-06 5367 INTEREST
id1 pay 2020-04-06 5308 INTEREST
id1 pay 2020-07-06 5308 INTEREST
id1 pay 2020-10-05 5367 INTEREST
id1 pay 2021-01-04 5367 INTEREST
id1 pay 2021-04-06 5250 INTEREST
id1 pay 2021-07-05 5308 INTEREST
id1 pay 2021-10-04 5367 INTEREST
id1 pay 2021-10-04 315000 FINAL_EX
id1 rec 2017-04-04 42961 INTEREST
id1 rec 2017-07-04 43438 INTEREST
id1 rec 2017-10-04 43915 INTEREST
id1 rec 2018-01-04 43915 INTEREST
id1 rec 2018-04-04 42961 INTEREST
id1 rec 2018-07-04 43438 INTEREST
id1 rec 2018-10-04 43915 INTEREST
id1 rec 2019-01-04 43915 INTEREST
id1 rec 2019-04-04 42961 INTEREST
id1 rec 2019-07-04 43438 INTEREST
id1 rec 2019-10-04 43915 INTEREST
id1 rec 2020-01-07 43915 INTEREST
id1 rec 2020-04-06 43438 INTEREST
id1 rec 2020-07-06 43438 INTEREST
id1 rec 2020-10-05 43915 INTEREST
id1 rec 2021-01-04 43915 INTEREST
id1 rec 2021-04-06 42961 INTEREST
id1 rec 2021-07-05 43438 INTEREST
id1 rec 2021-10-04 43915 INTEREST
id1 rec 2021-10-04 2988563 FINAL_EX
id2 … … … … and so on...
r data-cleaning
$endgroup$
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I have data in a bad format that I want to make tidy using an R script. I lack the skills to convert it into the format I want, so I look for an answer that (a) outlines a method, or (b) hands me the entire script.
The definition of tidy here is that the output has the following columns:
swap_id
a unique identifier.
leg
either pay or rec.
cashflow
the amount that is recieved
date
the date the cashflow arrives
type
whether the cashflow is of INTEREST of FINAL_EX.
The variable leg
takes on two different values: pay and rec. Currently leg
=pay is in one excel sheet and leg
=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.
I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w
Here is a sample of the content of sheet "rec":
id1 id2
date cf type date cf type
2017-04-04 42961 INTEREST 2015-04-07 33953 INTEREST
2017-07-04 43438 INTEREST 2016-04-04 203161 INTEREST
2017-10-04 43915 INTEREST 2017-04-04 203161 INTEREST
2018-01-04 43915 INTEREST 2018-04-04 203161 INTEREST
2018-04-04 42961 INTEREST 2019-04-04 203161 INTEREST
2018-07-04 43438 INTEREST 2020-04-06 203161 INTEREST
2018-10-04 43915 INTEREST 2021-04-06 203161 INTEREST
2019-01-04 43915 INTEREST 2022-04-04 203161 INTEREST
2019-04-04 42961 INTEREST 2023-04-04 203161 INTEREST
2019-07-04 43438 INTEREST 2023-04-04 5016330 FINAL_EX
2019-10-04 43915 INTEREST
2020-01-07 43915 INTEREST
2020-04-06 43438 INTEREST
2020-07-06 43438 INTEREST
2020-10-05 43915 INTEREST
2021-01-04 43915 INTEREST
2021-04-06 42961 INTEREST
2021-07-05 43438 INTEREST
2021-10-04 43915 INTEREST
2021-10-04 2988563 FINAL_EX
Here is a sample of the content of sheet "pay":
id1 id2
date cf type date cf type
2017-04-04 5250 INTEREST 2015-04-07 30938 INTEREST
2017-07-04 5308 INTEREST 2016-04-04 30938 INTEREST
2017-10-04 5367 INTEREST 2017-04-04 30938 INTEREST
2018-01-04 5367 INTEREST 2018-04-04 30938 INTEREST
2018-04-04 5250 INTEREST 2019-04-04 30938 INTEREST
2018-07-04 5308 INTEREST 2020-04-06 30938 INTEREST
2018-10-04 5367 INTEREST 2021-04-06 30938 INTEREST
2019-01-04 5367 INTEREST 2022-04-04 30938 INTEREST
2019-04-04 5250 INTEREST 2023-04-04 30938 INTEREST
2019-07-04 5308 INTEREST 2023-04-04 540000 FINAL_EX
2019-10-04 5367 INTEREST
2020-01-06 5367 INTEREST
2020-04-06 5308 INTEREST
2020-07-06 5308 INTEREST
2020-10-05 5367 INTEREST
2021-01-04 5367 INTEREST
2021-04-06 5250 INTEREST
2021-07-05 5308 INTEREST
2021-10-04 5367 INTEREST
2021-10-04 315000 FINAL_EX
In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.
Here is how I want the output to be structured after the transformation.
swap_id leg date cf type
id1 pay 2017-04-04 5250 INTEREST
id1 pay 2017-07-04 5308 INTEREST
id1 pay 2017-10-04 5367 INTEREST
id1 pay 2018-01-04 5367 INTEREST
id1 pay 2018-04-04 5250 INTEREST
id1 pay 2018-07-04 5308 INTEREST
id1 pay 2018-10-04 5367 INTEREST
id1 pay 2019-01-04 5367 INTEREST
id1 pay 2019-04-04 5250 INTEREST
id1 pay 2019-07-04 5308 INTEREST
id1 pay 2019-10-04 5367 INTEREST
id1 pay 2020-01-06 5367 INTEREST
id1 pay 2020-04-06 5308 INTEREST
id1 pay 2020-07-06 5308 INTEREST
id1 pay 2020-10-05 5367 INTEREST
id1 pay 2021-01-04 5367 INTEREST
id1 pay 2021-04-06 5250 INTEREST
id1 pay 2021-07-05 5308 INTEREST
id1 pay 2021-10-04 5367 INTEREST
id1 pay 2021-10-04 315000 FINAL_EX
id1 rec 2017-04-04 42961 INTEREST
id1 rec 2017-07-04 43438 INTEREST
id1 rec 2017-10-04 43915 INTEREST
id1 rec 2018-01-04 43915 INTEREST
id1 rec 2018-04-04 42961 INTEREST
id1 rec 2018-07-04 43438 INTEREST
id1 rec 2018-10-04 43915 INTEREST
id1 rec 2019-01-04 43915 INTEREST
id1 rec 2019-04-04 42961 INTEREST
id1 rec 2019-07-04 43438 INTEREST
id1 rec 2019-10-04 43915 INTEREST
id1 rec 2020-01-07 43915 INTEREST
id1 rec 2020-04-06 43438 INTEREST
id1 rec 2020-07-06 43438 INTEREST
id1 rec 2020-10-05 43915 INTEREST
id1 rec 2021-01-04 43915 INTEREST
id1 rec 2021-04-06 42961 INTEREST
id1 rec 2021-07-05 43438 INTEREST
id1 rec 2021-10-04 43915 INTEREST
id1 rec 2021-10-04 2988563 FINAL_EX
id2 … … … … and so on...
r data-cleaning
$endgroup$
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57
add a comment |
$begingroup$
I have data in a bad format that I want to make tidy using an R script. I lack the skills to convert it into the format I want, so I look for an answer that (a) outlines a method, or (b) hands me the entire script.
The definition of tidy here is that the output has the following columns:
swap_id
a unique identifier.
leg
either pay or rec.
cashflow
the amount that is recieved
date
the date the cashflow arrives
type
whether the cashflow is of INTEREST of FINAL_EX.
The variable leg
takes on two different values: pay and rec. Currently leg
=pay is in one excel sheet and leg
=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.
I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w
Here is a sample of the content of sheet "rec":
id1 id2
date cf type date cf type
2017-04-04 42961 INTEREST 2015-04-07 33953 INTEREST
2017-07-04 43438 INTEREST 2016-04-04 203161 INTEREST
2017-10-04 43915 INTEREST 2017-04-04 203161 INTEREST
2018-01-04 43915 INTEREST 2018-04-04 203161 INTEREST
2018-04-04 42961 INTEREST 2019-04-04 203161 INTEREST
2018-07-04 43438 INTEREST 2020-04-06 203161 INTEREST
2018-10-04 43915 INTEREST 2021-04-06 203161 INTEREST
2019-01-04 43915 INTEREST 2022-04-04 203161 INTEREST
2019-04-04 42961 INTEREST 2023-04-04 203161 INTEREST
2019-07-04 43438 INTEREST 2023-04-04 5016330 FINAL_EX
2019-10-04 43915 INTEREST
2020-01-07 43915 INTEREST
2020-04-06 43438 INTEREST
2020-07-06 43438 INTEREST
2020-10-05 43915 INTEREST
2021-01-04 43915 INTEREST
2021-04-06 42961 INTEREST
2021-07-05 43438 INTEREST
2021-10-04 43915 INTEREST
2021-10-04 2988563 FINAL_EX
Here is a sample of the content of sheet "pay":
id1 id2
date cf type date cf type
2017-04-04 5250 INTEREST 2015-04-07 30938 INTEREST
2017-07-04 5308 INTEREST 2016-04-04 30938 INTEREST
2017-10-04 5367 INTEREST 2017-04-04 30938 INTEREST
2018-01-04 5367 INTEREST 2018-04-04 30938 INTEREST
2018-04-04 5250 INTEREST 2019-04-04 30938 INTEREST
2018-07-04 5308 INTEREST 2020-04-06 30938 INTEREST
2018-10-04 5367 INTEREST 2021-04-06 30938 INTEREST
2019-01-04 5367 INTEREST 2022-04-04 30938 INTEREST
2019-04-04 5250 INTEREST 2023-04-04 30938 INTEREST
2019-07-04 5308 INTEREST 2023-04-04 540000 FINAL_EX
2019-10-04 5367 INTEREST
2020-01-06 5367 INTEREST
2020-04-06 5308 INTEREST
2020-07-06 5308 INTEREST
2020-10-05 5367 INTEREST
2021-01-04 5367 INTEREST
2021-04-06 5250 INTEREST
2021-07-05 5308 INTEREST
2021-10-04 5367 INTEREST
2021-10-04 315000 FINAL_EX
In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.
Here is how I want the output to be structured after the transformation.
swap_id leg date cf type
id1 pay 2017-04-04 5250 INTEREST
id1 pay 2017-07-04 5308 INTEREST
id1 pay 2017-10-04 5367 INTEREST
id1 pay 2018-01-04 5367 INTEREST
id1 pay 2018-04-04 5250 INTEREST
id1 pay 2018-07-04 5308 INTEREST
id1 pay 2018-10-04 5367 INTEREST
id1 pay 2019-01-04 5367 INTEREST
id1 pay 2019-04-04 5250 INTEREST
id1 pay 2019-07-04 5308 INTEREST
id1 pay 2019-10-04 5367 INTEREST
id1 pay 2020-01-06 5367 INTEREST
id1 pay 2020-04-06 5308 INTEREST
id1 pay 2020-07-06 5308 INTEREST
id1 pay 2020-10-05 5367 INTEREST
id1 pay 2021-01-04 5367 INTEREST
id1 pay 2021-04-06 5250 INTEREST
id1 pay 2021-07-05 5308 INTEREST
id1 pay 2021-10-04 5367 INTEREST
id1 pay 2021-10-04 315000 FINAL_EX
id1 rec 2017-04-04 42961 INTEREST
id1 rec 2017-07-04 43438 INTEREST
id1 rec 2017-10-04 43915 INTEREST
id1 rec 2018-01-04 43915 INTEREST
id1 rec 2018-04-04 42961 INTEREST
id1 rec 2018-07-04 43438 INTEREST
id1 rec 2018-10-04 43915 INTEREST
id1 rec 2019-01-04 43915 INTEREST
id1 rec 2019-04-04 42961 INTEREST
id1 rec 2019-07-04 43438 INTEREST
id1 rec 2019-10-04 43915 INTEREST
id1 rec 2020-01-07 43915 INTEREST
id1 rec 2020-04-06 43438 INTEREST
id1 rec 2020-07-06 43438 INTEREST
id1 rec 2020-10-05 43915 INTEREST
id1 rec 2021-01-04 43915 INTEREST
id1 rec 2021-04-06 42961 INTEREST
id1 rec 2021-07-05 43438 INTEREST
id1 rec 2021-10-04 43915 INTEREST
id1 rec 2021-10-04 2988563 FINAL_EX
id2 … … … … and so on...
r data-cleaning
$endgroup$
I have data in a bad format that I want to make tidy using an R script. I lack the skills to convert it into the format I want, so I look for an answer that (a) outlines a method, or (b) hands me the entire script.
The definition of tidy here is that the output has the following columns:
swap_id
a unique identifier.
leg
either pay or rec.
cashflow
the amount that is recieved
date
the date the cashflow arrives
type
whether the cashflow is of INTEREST of FINAL_EX.
The variable leg
takes on two different values: pay and rec. Currently leg
=pay is in one excel sheet and leg
=rec is in a second excel sheet. The entire excel workbook hence contains two sheets.
I have uploaded the entire excel file to https://we.tl/V9C5iVMH4w
Here is a sample of the content of sheet "rec":
id1 id2
date cf type date cf type
2017-04-04 42961 INTEREST 2015-04-07 33953 INTEREST
2017-07-04 43438 INTEREST 2016-04-04 203161 INTEREST
2017-10-04 43915 INTEREST 2017-04-04 203161 INTEREST
2018-01-04 43915 INTEREST 2018-04-04 203161 INTEREST
2018-04-04 42961 INTEREST 2019-04-04 203161 INTEREST
2018-07-04 43438 INTEREST 2020-04-06 203161 INTEREST
2018-10-04 43915 INTEREST 2021-04-06 203161 INTEREST
2019-01-04 43915 INTEREST 2022-04-04 203161 INTEREST
2019-04-04 42961 INTEREST 2023-04-04 203161 INTEREST
2019-07-04 43438 INTEREST 2023-04-04 5016330 FINAL_EX
2019-10-04 43915 INTEREST
2020-01-07 43915 INTEREST
2020-04-06 43438 INTEREST
2020-07-06 43438 INTEREST
2020-10-05 43915 INTEREST
2021-01-04 43915 INTEREST
2021-04-06 42961 INTEREST
2021-07-05 43438 INTEREST
2021-10-04 43915 INTEREST
2021-10-04 2988563 FINAL_EX
Here is a sample of the content of sheet "pay":
id1 id2
date cf type date cf type
2017-04-04 5250 INTEREST 2015-04-07 30938 INTEREST
2017-07-04 5308 INTEREST 2016-04-04 30938 INTEREST
2017-10-04 5367 INTEREST 2017-04-04 30938 INTEREST
2018-01-04 5367 INTEREST 2018-04-04 30938 INTEREST
2018-04-04 5250 INTEREST 2019-04-04 30938 INTEREST
2018-07-04 5308 INTEREST 2020-04-06 30938 INTEREST
2018-10-04 5367 INTEREST 2021-04-06 30938 INTEREST
2019-01-04 5367 INTEREST 2022-04-04 30938 INTEREST
2019-04-04 5250 INTEREST 2023-04-04 30938 INTEREST
2019-07-04 5308 INTEREST 2023-04-04 540000 FINAL_EX
2019-10-04 5367 INTEREST
2020-01-06 5367 INTEREST
2020-04-06 5308 INTEREST
2020-07-06 5308 INTEREST
2020-10-05 5367 INTEREST
2021-01-04 5367 INTEREST
2021-04-06 5250 INTEREST
2021-07-05 5308 INTEREST
2021-10-04 5367 INTEREST
2021-10-04 315000 FINAL_EX
In other words, each id gets itws own little table. An every table is separated using an empty column.
This is good for eye balling the data, but horrible for working with it.
Here is how I want the output to be structured after the transformation.
swap_id leg date cf type
id1 pay 2017-04-04 5250 INTEREST
id1 pay 2017-07-04 5308 INTEREST
id1 pay 2017-10-04 5367 INTEREST
id1 pay 2018-01-04 5367 INTEREST
id1 pay 2018-04-04 5250 INTEREST
id1 pay 2018-07-04 5308 INTEREST
id1 pay 2018-10-04 5367 INTEREST
id1 pay 2019-01-04 5367 INTEREST
id1 pay 2019-04-04 5250 INTEREST
id1 pay 2019-07-04 5308 INTEREST
id1 pay 2019-10-04 5367 INTEREST
id1 pay 2020-01-06 5367 INTEREST
id1 pay 2020-04-06 5308 INTEREST
id1 pay 2020-07-06 5308 INTEREST
id1 pay 2020-10-05 5367 INTEREST
id1 pay 2021-01-04 5367 INTEREST
id1 pay 2021-04-06 5250 INTEREST
id1 pay 2021-07-05 5308 INTEREST
id1 pay 2021-10-04 5367 INTEREST
id1 pay 2021-10-04 315000 FINAL_EX
id1 rec 2017-04-04 42961 INTEREST
id1 rec 2017-07-04 43438 INTEREST
id1 rec 2017-10-04 43915 INTEREST
id1 rec 2018-01-04 43915 INTEREST
id1 rec 2018-04-04 42961 INTEREST
id1 rec 2018-07-04 43438 INTEREST
id1 rec 2018-10-04 43915 INTEREST
id1 rec 2019-01-04 43915 INTEREST
id1 rec 2019-04-04 42961 INTEREST
id1 rec 2019-07-04 43438 INTEREST
id1 rec 2019-10-04 43915 INTEREST
id1 rec 2020-01-07 43915 INTEREST
id1 rec 2020-04-06 43438 INTEREST
id1 rec 2020-07-06 43438 INTEREST
id1 rec 2020-10-05 43915 INTEREST
id1 rec 2021-01-04 43915 INTEREST
id1 rec 2021-04-06 42961 INTEREST
id1 rec 2021-07-05 43438 INTEREST
id1 rec 2021-10-04 43915 INTEREST
id1 rec 2021-10-04 2988563 FINAL_EX
id2 … … … … and so on...
r data-cleaning
r data-cleaning
asked May 24 '18 at 10:01
jacobjacob
148110
148110
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57
add a comment |
$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57
$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57
$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
## read file
# create path
file4 <- "GetSwapFlowdb2.xlsm"
filepath3 <- paste(folder1, file4, sep="")
# read. sheet 2 is recieve leg, and 3 is pay leg
db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()
db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame()
## save swap IDs
(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()
transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()
stopifnot(transid_list_pay == transid_list)
rm(transid_list_pay)
(nr_id <- length(transid_list))
## make function that cleans a single swap
db_clean_oneswap <- function(aswap){
# input...
# aswap: a single swap in the form fo data.frame
# output...
# aswpap: the cleaned version
# format cf
aswap$cf <- as.numeric(aswap$cf)
# format dates from excel format to yyyy-mm-dd
aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))
aswap$date <- as.Date(aswap$date)
# create t
aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")
# save the swap to our list
return(aswap)
}
## make function so we can save swap cashflows to a list
db_create_swaplist <- function(data, transids){
# input...
# data: the dataframe which comes from excel
# transids: a list character vector of swap trans IDs
# output...
# a list with each element containing a dataframe of the swap
# create empty list
swap_list <- NULL
# for loop to create a list containing each swap.
for (id in transids){
# create df we will "eat" from
data <- db_rec
# select three cols, save to a swap
one_swap <- data[-1, c(1,2,3)]
# remove those cols we saved
data <- data[-c(1,2,3)]
# rename cols
names(one_swap) <- c("date", "cf", "type")
# remove a row, remove na
one_swap <- one_swap[-1, ]
one_swap <- na.omit(one_swap)
# clean it
one_swap <- db_clean_oneswap(one_swap)
# save the swap to our list
swap_list[[id]] <- one_swap
}
return(swap_list)
}
## create a list for rec and pay
swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)
swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)
## ways of accessing the lists
swaps_rec[[1]]
transid_list[1]
swaps_rec[[ transid_list[2] ]]
swaps_rec[[as.character(transid)]]
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f32099%2fconvert-from-many-sub-tables-to-a-single-tidy-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
## read file
# create path
file4 <- "GetSwapFlowdb2.xlsm"
filepath3 <- paste(folder1, file4, sep="")
# read. sheet 2 is recieve leg, and 3 is pay leg
db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()
db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame()
## save swap IDs
(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()
transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()
stopifnot(transid_list_pay == transid_list)
rm(transid_list_pay)
(nr_id <- length(transid_list))
## make function that cleans a single swap
db_clean_oneswap <- function(aswap){
# input...
# aswap: a single swap in the form fo data.frame
# output...
# aswpap: the cleaned version
# format cf
aswap$cf <- as.numeric(aswap$cf)
# format dates from excel format to yyyy-mm-dd
aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))
aswap$date <- as.Date(aswap$date)
# create t
aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")
# save the swap to our list
return(aswap)
}
## make function so we can save swap cashflows to a list
db_create_swaplist <- function(data, transids){
# input...
# data: the dataframe which comes from excel
# transids: a list character vector of swap trans IDs
# output...
# a list with each element containing a dataframe of the swap
# create empty list
swap_list <- NULL
# for loop to create a list containing each swap.
for (id in transids){
# create df we will "eat" from
data <- db_rec
# select three cols, save to a swap
one_swap <- data[-1, c(1,2,3)]
# remove those cols we saved
data <- data[-c(1,2,3)]
# rename cols
names(one_swap) <- c("date", "cf", "type")
# remove a row, remove na
one_swap <- one_swap[-1, ]
one_swap <- na.omit(one_swap)
# clean it
one_swap <- db_clean_oneswap(one_swap)
# save the swap to our list
swap_list[[id]] <- one_swap
}
return(swap_list)
}
## create a list for rec and pay
swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)
swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)
## ways of accessing the lists
swaps_rec[[1]]
transid_list[1]
swaps_rec[[ transid_list[2] ]]
swaps_rec[[as.character(transid)]]
$endgroup$
add a comment |
$begingroup$
## read file
# create path
file4 <- "GetSwapFlowdb2.xlsm"
filepath3 <- paste(folder1, file4, sep="")
# read. sheet 2 is recieve leg, and 3 is pay leg
db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()
db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame()
## save swap IDs
(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()
transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()
stopifnot(transid_list_pay == transid_list)
rm(transid_list_pay)
(nr_id <- length(transid_list))
## make function that cleans a single swap
db_clean_oneswap <- function(aswap){
# input...
# aswap: a single swap in the form fo data.frame
# output...
# aswpap: the cleaned version
# format cf
aswap$cf <- as.numeric(aswap$cf)
# format dates from excel format to yyyy-mm-dd
aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))
aswap$date <- as.Date(aswap$date)
# create t
aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")
# save the swap to our list
return(aswap)
}
## make function so we can save swap cashflows to a list
db_create_swaplist <- function(data, transids){
# input...
# data: the dataframe which comes from excel
# transids: a list character vector of swap trans IDs
# output...
# a list with each element containing a dataframe of the swap
# create empty list
swap_list <- NULL
# for loop to create a list containing each swap.
for (id in transids){
# create df we will "eat" from
data <- db_rec
# select three cols, save to a swap
one_swap <- data[-1, c(1,2,3)]
# remove those cols we saved
data <- data[-c(1,2,3)]
# rename cols
names(one_swap) <- c("date", "cf", "type")
# remove a row, remove na
one_swap <- one_swap[-1, ]
one_swap <- na.omit(one_swap)
# clean it
one_swap <- db_clean_oneswap(one_swap)
# save the swap to our list
swap_list[[id]] <- one_swap
}
return(swap_list)
}
## create a list for rec and pay
swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)
swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)
## ways of accessing the lists
swaps_rec[[1]]
transid_list[1]
swaps_rec[[ transid_list[2] ]]
swaps_rec[[as.character(transid)]]
$endgroup$
add a comment |
$begingroup$
## read file
# create path
file4 <- "GetSwapFlowdb2.xlsm"
filepath3 <- paste(folder1, file4, sep="")
# read. sheet 2 is recieve leg, and 3 is pay leg
db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()
db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame()
## save swap IDs
(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()
transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()
stopifnot(transid_list_pay == transid_list)
rm(transid_list_pay)
(nr_id <- length(transid_list))
## make function that cleans a single swap
db_clean_oneswap <- function(aswap){
# input...
# aswap: a single swap in the form fo data.frame
# output...
# aswpap: the cleaned version
# format cf
aswap$cf <- as.numeric(aswap$cf)
# format dates from excel format to yyyy-mm-dd
aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))
aswap$date <- as.Date(aswap$date)
# create t
aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")
# save the swap to our list
return(aswap)
}
## make function so we can save swap cashflows to a list
db_create_swaplist <- function(data, transids){
# input...
# data: the dataframe which comes from excel
# transids: a list character vector of swap trans IDs
# output...
# a list with each element containing a dataframe of the swap
# create empty list
swap_list <- NULL
# for loop to create a list containing each swap.
for (id in transids){
# create df we will "eat" from
data <- db_rec
# select three cols, save to a swap
one_swap <- data[-1, c(1,2,3)]
# remove those cols we saved
data <- data[-c(1,2,3)]
# rename cols
names(one_swap) <- c("date", "cf", "type")
# remove a row, remove na
one_swap <- one_swap[-1, ]
one_swap <- na.omit(one_swap)
# clean it
one_swap <- db_clean_oneswap(one_swap)
# save the swap to our list
swap_list[[id]] <- one_swap
}
return(swap_list)
}
## create a list for rec and pay
swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)
swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)
## ways of accessing the lists
swaps_rec[[1]]
transid_list[1]
swaps_rec[[ transid_list[2] ]]
swaps_rec[[as.character(transid)]]
$endgroup$
## read file
# create path
file4 <- "GetSwapFlowdb2.xlsm"
filepath3 <- paste(folder1, file4, sep="")
# read. sheet 2 is recieve leg, and 3 is pay leg
db_rec <- read_excel(filepath3, col_names = FALSE, sheet=2) %>% as.data.frame()
db_pay <- read_excel(filepath3, col_names = FALSE, sheet=3) %>% as.data.frame()
## save swap IDs
(transid_list <- db_rec[1,] %>% t() %>% na.omit() %>% as.vector()) #%>% as.numeric()
transid_list_pay <- db_pay[1,] %>% t() %>% na.omit() %>% as.vector()
stopifnot(transid_list_pay == transid_list)
rm(transid_list_pay)
(nr_id <- length(transid_list))
## make function that cleans a single swap
db_clean_oneswap <- function(aswap){
# input...
# aswap: a single swap in the form fo data.frame
# output...
# aswpap: the cleaned version
# format cf
aswap$cf <- as.numeric(aswap$cf)
# format dates from excel format to yyyy-mm-dd
aswap$date <- as.POSIXct(as.Date(as.numeric(aswap$date), origin="1899-12-30"))
aswap$date <- as.Date(aswap$date)
# create t
aswap$t <- time_length(difftime(aswap$date, date_valuation), "years")
# save the swap to our list
return(aswap)
}
## make function so we can save swap cashflows to a list
db_create_swaplist <- function(data, transids){
# input...
# data: the dataframe which comes from excel
# transids: a list character vector of swap trans IDs
# output...
# a list with each element containing a dataframe of the swap
# create empty list
swap_list <- NULL
# for loop to create a list containing each swap.
for (id in transids){
# create df we will "eat" from
data <- db_rec
# select three cols, save to a swap
one_swap <- data[-1, c(1,2,3)]
# remove those cols we saved
data <- data[-c(1,2,3)]
# rename cols
names(one_swap) <- c("date", "cf", "type")
# remove a row, remove na
one_swap <- one_swap[-1, ]
one_swap <- na.omit(one_swap)
# clean it
one_swap <- db_clean_oneswap(one_swap)
# save the swap to our list
swap_list[[id]] <- one_swap
}
return(swap_list)
}
## create a list for rec and pay
swaps_rec <- db_create_swaplist(data = db_rec, transids = transid_list)
swaps_pay <- db_create_swaplist(data = db_pay, transids = transid_list)
## ways of accessing the lists
swaps_rec[[1]]
transid_list[1]
swaps_rec[[ transid_list[2] ]]
swaps_rec[[as.character(transid)]]
answered May 24 '18 at 13:32
jacobjacob
148110
148110
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f32099%2fconvert-from-many-sub-tables-to-a-single-tidy-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Joining on the primary keys?
$endgroup$
– Aditya
May 24 '18 at 11:57