Multivariate Time-Series Clustering
$begingroup$
I have a streaming data along with timestamp dataset that looks like this:
1.png
Timestamp can be inclusive of "seconds" too, but the data may or may not change every second. it depends on the previous values(rows i.e data which came earlier w.r.t time).
Column1, Column2 .... ColumnN correspond to the variables (they change over time) and "Label" shows the different samples. You can assume that the values tend to decrease over time for a particular label.
Labels A1,B1,C1.........A2,... M labels.
Note : Values of timeNew of a Label depends on values of timeOld of that Label and Labels belong to its cluster.
I need to group Labels with similar behavior over time together (e.g. Label A1 and Label C1 should be put in the same cluster and B1,D2 may fall into same cluster over time as they tend to behave similar over time).
I thought of using DTW and get the similarity of each Label with respect to other Labels. but not sure, how to proceed when i have N Columns.
To be precise, i need to group Labels based on their similarities (Column1 .. ColumnN) over time and group them.
Once i group them when new data comes in i should be able to predict the values(Column1.. ColumnN) for a Label based on the previously seen data(can be just minutes closer to the current prediction) and the values associated with the Labels in its cluster and predict it accordingly.
machine-learning python clustering time-series
$endgroup$
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I have a streaming data along with timestamp dataset that looks like this:
1.png
Timestamp can be inclusive of "seconds" too, but the data may or may not change every second. it depends on the previous values(rows i.e data which came earlier w.r.t time).
Column1, Column2 .... ColumnN correspond to the variables (they change over time) and "Label" shows the different samples. You can assume that the values tend to decrease over time for a particular label.
Labels A1,B1,C1.........A2,... M labels.
Note : Values of timeNew of a Label depends on values of timeOld of that Label and Labels belong to its cluster.
I need to group Labels with similar behavior over time together (e.g. Label A1 and Label C1 should be put in the same cluster and B1,D2 may fall into same cluster over time as they tend to behave similar over time).
I thought of using DTW and get the similarity of each Label with respect to other Labels. but not sure, how to proceed when i have N Columns.
To be precise, i need to group Labels based on their similarities (Column1 .. ColumnN) over time and group them.
Once i group them when new data comes in i should be able to predict the values(Column1.. ColumnN) for a Label based on the previously seen data(can be just minutes closer to the current prediction) and the values associated with the Labels in its cluster and predict it accordingly.
machine-learning python clustering time-series
$endgroup$
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
$begingroup$
@Vincenzo please look into this question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:24
$begingroup$
@Toros91 refer the question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:25
$begingroup$
I've retracted my vote, lets see what others think about it as well.
$endgroup$
– Toros91
Mar 22 '18 at 1:22
$begingroup$
Looks like you are looking for a Multivariate Time series forecasting.
$endgroup$
– Rajshekar Reddy
Nov 20 '18 at 18:38
add a comment |
$begingroup$
I have a streaming data along with timestamp dataset that looks like this:
1.png
Timestamp can be inclusive of "seconds" too, but the data may or may not change every second. it depends on the previous values(rows i.e data which came earlier w.r.t time).
Column1, Column2 .... ColumnN correspond to the variables (they change over time) and "Label" shows the different samples. You can assume that the values tend to decrease over time for a particular label.
Labels A1,B1,C1.........A2,... M labels.
Note : Values of timeNew of a Label depends on values of timeOld of that Label and Labels belong to its cluster.
I need to group Labels with similar behavior over time together (e.g. Label A1 and Label C1 should be put in the same cluster and B1,D2 may fall into same cluster over time as they tend to behave similar over time).
I thought of using DTW and get the similarity of each Label with respect to other Labels. but not sure, how to proceed when i have N Columns.
To be precise, i need to group Labels based on their similarities (Column1 .. ColumnN) over time and group them.
Once i group them when new data comes in i should be able to predict the values(Column1.. ColumnN) for a Label based on the previously seen data(can be just minutes closer to the current prediction) and the values associated with the Labels in its cluster and predict it accordingly.
machine-learning python clustering time-series
$endgroup$
I have a streaming data along with timestamp dataset that looks like this:
1.png
Timestamp can be inclusive of "seconds" too, but the data may or may not change every second. it depends on the previous values(rows i.e data which came earlier w.r.t time).
Column1, Column2 .... ColumnN correspond to the variables (they change over time) and "Label" shows the different samples. You can assume that the values tend to decrease over time for a particular label.
Labels A1,B1,C1.........A2,... M labels.
Note : Values of timeNew of a Label depends on values of timeOld of that Label and Labels belong to its cluster.
I need to group Labels with similar behavior over time together (e.g. Label A1 and Label C1 should be put in the same cluster and B1,D2 may fall into same cluster over time as they tend to behave similar over time).
I thought of using DTW and get the similarity of each Label with respect to other Labels. but not sure, how to proceed when i have N Columns.
To be precise, i need to group Labels based on their similarities (Column1 .. ColumnN) over time and group them.
Once i group them when new data comes in i should be able to predict the values(Column1.. ColumnN) for a Label based on the previously seen data(can be just minutes closer to the current prediction) and the values associated with the Labels in its cluster and predict it accordingly.
machine-learning python clustering time-series
machine-learning python clustering time-series
edited Mar 21 '18 at 23:14
Abhinaya Krishna
asked Mar 20 '18 at 0:37
Abhinaya KrishnaAbhinaya Krishna
2413
2413
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
$begingroup$
@Vincenzo please look into this question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:24
$begingroup$
@Toros91 refer the question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:25
$begingroup$
I've retracted my vote, lets see what others think about it as well.
$endgroup$
– Toros91
Mar 22 '18 at 1:22
$begingroup$
Looks like you are looking for a Multivariate Time series forecasting.
$endgroup$
– Rajshekar Reddy
Nov 20 '18 at 18:38
add a comment |
$begingroup$
@Vincenzo please look into this question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:24
$begingroup$
@Toros91 refer the question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:25
$begingroup$
I've retracted my vote, lets see what others think about it as well.
$endgroup$
– Toros91
Mar 22 '18 at 1:22
$begingroup$
Looks like you are looking for a Multivariate Time series forecasting.
$endgroup$
– Rajshekar Reddy
Nov 20 '18 at 18:38
$begingroup$
@Vincenzo please look into this question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:24
$begingroup$
@Vincenzo please look into this question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:24
$begingroup$
@Toros91 refer the question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:25
$begingroup$
@Toros91 refer the question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:25
$begingroup$
I've retracted my vote, lets see what others think about it as well.
$endgroup$
– Toros91
Mar 22 '18 at 1:22
$begingroup$
I've retracted my vote, lets see what others think about it as well.
$endgroup$
– Toros91
Mar 22 '18 at 1:22
$begingroup$
Looks like you are looking for a Multivariate Time series forecasting.
$endgroup$
– Rajshekar Reddy
Nov 20 '18 at 18:38
$begingroup$
Looks like you are looking for a Multivariate Time series forecasting.
$endgroup$
– Rajshekar Reddy
Nov 20 '18 at 18:38
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
It depends a bit if the timestamps have any connection to each other (is t2 impacted by t1 as example).
In general this looks like a classification problem and you can use for example sklearn. If you want to distinguish A and C and all other cases you would end up with a multi-class classification problem and not all algorithms support these. If not You can just transform the Label into target (A+B) /non_target (the rest).
Three additional advice:
- You can use pandas Dataframes for the pre-processing.
- You should at least have a basic understanding of test and training samples before you start anything.
- The correct success metric is both important and sometimes challenging to find. Quick example: If you choose accuracy and 99% of your values belong to a class any algorithm that per default predicts this class will have a very high accuracy.
If timestamps are connected you might want to do some pre-processing and add data from previous timestamps to the current one (example: average value of column 1 of last x previous tstamps).
$endgroup$
$begingroup$
Apologies, i was not clear earlier. Please do read the question now. @el-burro
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:15
add a comment |
$begingroup$
Aleksandr Blekh's answer in this older question provides a lot of interesting reading material for time-series clustering methods and examples. Also, I include below some interesting reading material for calculating similarity among multivariate time-series (the latest 2 are quite old but I think they are very interesting):
An approach on the use of DTW with multivariate time-series (the paper actual refers to classification but you might want to use the idea and adjust it for clustering)
A paper on clustering of time-series
A PCA-based similarity measure for multivariate time-series
A review on feature extraction and pattern recognition methods in time-series data
Before proceeding with any method, I believe it is important to spend some time to think of the following:
Try to select the right step for your input data (e.g. if the time-steps are per second, the time-series might be too long and unnecessarily detailed for this job, while hourly data might catch the patterns better).
Seasonality might be interesting to take under consideration: e.g. if the time-series are hourly and last for several days/months, there might be some daily/monthly seasonality. In this case, you might want to calculate and compare the average day/month from each time-series (if so, you will also need to decide whether weekdays and weekends should be averaged all together or treated in a different way).
Depending on what you are trying to find out, you need to decide
whether Dynamic time warping (DTW) is useful for you. For example, if
2 time-series have the exact same pattern but one of the 2 has a
time-delay, should they still belong in the same cluster? (and how
small/big time-delay is acceptable to put them in the same cluster?)
$endgroup$
$begingroup$
Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 7:51
$begingroup$
So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
$endgroup$
– missrg
Mar 21 '18 at 9:32
$begingroup$
You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
$endgroup$
– missrg
Mar 21 '18 at 10:15
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29287%2fmultivariate-time-series-clustering%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It depends a bit if the timestamps have any connection to each other (is t2 impacted by t1 as example).
In general this looks like a classification problem and you can use for example sklearn. If you want to distinguish A and C and all other cases you would end up with a multi-class classification problem and not all algorithms support these. If not You can just transform the Label into target (A+B) /non_target (the rest).
Three additional advice:
- You can use pandas Dataframes for the pre-processing.
- You should at least have a basic understanding of test and training samples before you start anything.
- The correct success metric is both important and sometimes challenging to find. Quick example: If you choose accuracy and 99% of your values belong to a class any algorithm that per default predicts this class will have a very high accuracy.
If timestamps are connected you might want to do some pre-processing and add data from previous timestamps to the current one (example: average value of column 1 of last x previous tstamps).
$endgroup$
$begingroup$
Apologies, i was not clear earlier. Please do read the question now. @el-burro
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:15
add a comment |
$begingroup$
It depends a bit if the timestamps have any connection to each other (is t2 impacted by t1 as example).
In general this looks like a classification problem and you can use for example sklearn. If you want to distinguish A and C and all other cases you would end up with a multi-class classification problem and not all algorithms support these. If not You can just transform the Label into target (A+B) /non_target (the rest).
Three additional advice:
- You can use pandas Dataframes for the pre-processing.
- You should at least have a basic understanding of test and training samples before you start anything.
- The correct success metric is both important and sometimes challenging to find. Quick example: If you choose accuracy and 99% of your values belong to a class any algorithm that per default predicts this class will have a very high accuracy.
If timestamps are connected you might want to do some pre-processing and add data from previous timestamps to the current one (example: average value of column 1 of last x previous tstamps).
$endgroup$
$begingroup$
Apologies, i was not clear earlier. Please do read the question now. @el-burro
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:15
add a comment |
$begingroup$
It depends a bit if the timestamps have any connection to each other (is t2 impacted by t1 as example).
In general this looks like a classification problem and you can use for example sklearn. If you want to distinguish A and C and all other cases you would end up with a multi-class classification problem and not all algorithms support these. If not You can just transform the Label into target (A+B) /non_target (the rest).
Three additional advice:
- You can use pandas Dataframes for the pre-processing.
- You should at least have a basic understanding of test and training samples before you start anything.
- The correct success metric is both important and sometimes challenging to find. Quick example: If you choose accuracy and 99% of your values belong to a class any algorithm that per default predicts this class will have a very high accuracy.
If timestamps are connected you might want to do some pre-processing and add data from previous timestamps to the current one (example: average value of column 1 of last x previous tstamps).
$endgroup$
It depends a bit if the timestamps have any connection to each other (is t2 impacted by t1 as example).
In general this looks like a classification problem and you can use for example sklearn. If you want to distinguish A and C and all other cases you would end up with a multi-class classification problem and not all algorithms support these. If not You can just transform the Label into target (A+B) /non_target (the rest).
Three additional advice:
- You can use pandas Dataframes for the pre-processing.
- You should at least have a basic understanding of test and training samples before you start anything.
- The correct success metric is both important and sometimes challenging to find. Quick example: If you choose accuracy and 99% of your values belong to a class any algorithm that per default predicts this class will have a very high accuracy.
If timestamps are connected you might want to do some pre-processing and add data from previous timestamps to the current one (example: average value of column 1 of last x previous tstamps).
answered Mar 20 '18 at 11:01
El BurroEl Burro
455311
455311
$begingroup$
Apologies, i was not clear earlier. Please do read the question now. @el-burro
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:15
add a comment |
$begingroup$
Apologies, i was not clear earlier. Please do read the question now. @el-burro
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:15
$begingroup$
Apologies, i was not clear earlier. Please do read the question now. @el-burro
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:15
$begingroup$
Apologies, i was not clear earlier. Please do read the question now. @el-burro
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:15
add a comment |
$begingroup$
Aleksandr Blekh's answer in this older question provides a lot of interesting reading material for time-series clustering methods and examples. Also, I include below some interesting reading material for calculating similarity among multivariate time-series (the latest 2 are quite old but I think they are very interesting):
An approach on the use of DTW with multivariate time-series (the paper actual refers to classification but you might want to use the idea and adjust it for clustering)
A paper on clustering of time-series
A PCA-based similarity measure for multivariate time-series
A review on feature extraction and pattern recognition methods in time-series data
Before proceeding with any method, I believe it is important to spend some time to think of the following:
Try to select the right step for your input data (e.g. if the time-steps are per second, the time-series might be too long and unnecessarily detailed for this job, while hourly data might catch the patterns better).
Seasonality might be interesting to take under consideration: e.g. if the time-series are hourly and last for several days/months, there might be some daily/monthly seasonality. In this case, you might want to calculate and compare the average day/month from each time-series (if so, you will also need to decide whether weekdays and weekends should be averaged all together or treated in a different way).
Depending on what you are trying to find out, you need to decide
whether Dynamic time warping (DTW) is useful for you. For example, if
2 time-series have the exact same pattern but one of the 2 has a
time-delay, should they still belong in the same cluster? (and how
small/big time-delay is acceptable to put them in the same cluster?)
$endgroup$
$begingroup$
Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 7:51
$begingroup$
So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
$endgroup$
– missrg
Mar 21 '18 at 9:32
$begingroup$
You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
$endgroup$
– missrg
Mar 21 '18 at 10:15
add a comment |
$begingroup$
Aleksandr Blekh's answer in this older question provides a lot of interesting reading material for time-series clustering methods and examples. Also, I include below some interesting reading material for calculating similarity among multivariate time-series (the latest 2 are quite old but I think they are very interesting):
An approach on the use of DTW with multivariate time-series (the paper actual refers to classification but you might want to use the idea and adjust it for clustering)
A paper on clustering of time-series
A PCA-based similarity measure for multivariate time-series
A review on feature extraction and pattern recognition methods in time-series data
Before proceeding with any method, I believe it is important to spend some time to think of the following:
Try to select the right step for your input data (e.g. if the time-steps are per second, the time-series might be too long and unnecessarily detailed for this job, while hourly data might catch the patterns better).
Seasonality might be interesting to take under consideration: e.g. if the time-series are hourly and last for several days/months, there might be some daily/monthly seasonality. In this case, you might want to calculate and compare the average day/month from each time-series (if so, you will also need to decide whether weekdays and weekends should be averaged all together or treated in a different way).
Depending on what you are trying to find out, you need to decide
whether Dynamic time warping (DTW) is useful for you. For example, if
2 time-series have the exact same pattern but one of the 2 has a
time-delay, should they still belong in the same cluster? (and how
small/big time-delay is acceptable to put them in the same cluster?)
$endgroup$
$begingroup$
Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 7:51
$begingroup$
So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
$endgroup$
– missrg
Mar 21 '18 at 9:32
$begingroup$
You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
$endgroup$
– missrg
Mar 21 '18 at 10:15
add a comment |
$begingroup$
Aleksandr Blekh's answer in this older question provides a lot of interesting reading material for time-series clustering methods and examples. Also, I include below some interesting reading material for calculating similarity among multivariate time-series (the latest 2 are quite old but I think they are very interesting):
An approach on the use of DTW with multivariate time-series (the paper actual refers to classification but you might want to use the idea and adjust it for clustering)
A paper on clustering of time-series
A PCA-based similarity measure for multivariate time-series
A review on feature extraction and pattern recognition methods in time-series data
Before proceeding with any method, I believe it is important to spend some time to think of the following:
Try to select the right step for your input data (e.g. if the time-steps are per second, the time-series might be too long and unnecessarily detailed for this job, while hourly data might catch the patterns better).
Seasonality might be interesting to take under consideration: e.g. if the time-series are hourly and last for several days/months, there might be some daily/monthly seasonality. In this case, you might want to calculate and compare the average day/month from each time-series (if so, you will also need to decide whether weekdays and weekends should be averaged all together or treated in a different way).
Depending on what you are trying to find out, you need to decide
whether Dynamic time warping (DTW) is useful for you. For example, if
2 time-series have the exact same pattern but one of the 2 has a
time-delay, should they still belong in the same cluster? (and how
small/big time-delay is acceptable to put them in the same cluster?)
$endgroup$
Aleksandr Blekh's answer in this older question provides a lot of interesting reading material for time-series clustering methods and examples. Also, I include below some interesting reading material for calculating similarity among multivariate time-series (the latest 2 are quite old but I think they are very interesting):
An approach on the use of DTW with multivariate time-series (the paper actual refers to classification but you might want to use the idea and adjust it for clustering)
A paper on clustering of time-series
A PCA-based similarity measure for multivariate time-series
A review on feature extraction and pattern recognition methods in time-series data
Before proceeding with any method, I believe it is important to spend some time to think of the following:
Try to select the right step for your input data (e.g. if the time-steps are per second, the time-series might be too long and unnecessarily detailed for this job, while hourly data might catch the patterns better).
Seasonality might be interesting to take under consideration: e.g. if the time-series are hourly and last for several days/months, there might be some daily/monthly seasonality. In this case, you might want to calculate and compare the average day/month from each time-series (if so, you will also need to decide whether weekdays and weekends should be averaged all together or treated in a different way).
Depending on what you are trying to find out, you need to decide
whether Dynamic time warping (DTW) is useful for you. For example, if
2 time-series have the exact same pattern but one of the 2 has a
time-delay, should they still belong in the same cluster? (and how
small/big time-delay is acceptable to put them in the same cluster?)
edited Mar 21 '18 at 9:49
answered Mar 20 '18 at 11:06
missrgmissrg
36518
36518
$begingroup$
Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 7:51
$begingroup$
So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
$endgroup$
– missrg
Mar 21 '18 at 9:32
$begingroup$
You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
$endgroup$
– missrg
Mar 21 '18 at 10:15
add a comment |
$begingroup$
Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 7:51
$begingroup$
So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
$endgroup$
– missrg
Mar 21 '18 at 9:32
$begingroup$
You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
$endgroup$
– missrg
Mar 21 '18 at 10:15
$begingroup$
Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 7:51
$begingroup$
Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 7:51
$begingroup$
So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
$endgroup$
– missrg
Mar 21 '18 at 9:32
$begingroup$
So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
$endgroup$
– missrg
Mar 21 '18 at 9:32
$begingroup$
You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
$endgroup$
– missrg
Mar 21 '18 at 10:15
$begingroup$
You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
$endgroup$
– missrg
Mar 21 '18 at 10:15
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29287%2fmultivariate-time-series-clustering%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
@Vincenzo please look into this question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:24
$begingroup$
@Toros91 refer the question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:25
$begingroup$
I've retracted my vote, lets see what others think about it as well.
$endgroup$
– Toros91
Mar 22 '18 at 1:22
$begingroup$
Looks like you are looking for a Multivariate Time series forecasting.
$endgroup$
– Rajshekar Reddy
Nov 20 '18 at 18:38