Error: valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples...
$begingroup$
I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is:
valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance
`
df = pd.read_csv("data.txt")
df_test = pd.read_csv("validation.txt")
#split arrays into train and test data (cross validation)
train_text, test_text, train_y, test_y =
train_test_split(df,df,test_size = 0.2)
MAX_NB_WORDS = 5700
# get the raw text data
texts_train = train_text.astype(str)
texts_test = test_text.astype(str)
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)
tokenizer.fit_on_texts(texts_train)
sequences = tokenizer.texts_to_sequences(texts_train)
sequences_test = tokenizer.texts_to_sequences(texts_test)
word_index = tokenizer.word_index
type(tokenizer.word_index), len(tokenizer.word_index)
index_to_word = dict((i, w) for w, i in tokenizer.word_index.items())
" ".join([index_to_word[i] for i in sequences[0]])
seq_lens = [len(s) for s in sequences]
MAX_SEQUENCE_LENGTH = 100
# pad sequences with 0s
x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)
#print('Shape of data train:', x_train.shape) #cela a donnée (1,100)
#print('Shape of data test tensor:', x_test.shape)
y_train = train_y
y_test = test_y
print('Shape of label tensor:', y_train.shape)
EMBEDDING_DIM = 32
N_CLASSES = 2
y_train = keras.utils.to_categorical( y_train , N_CLASSES )
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')
embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
trainable=True)
embedded_sequences = embedding_layer(sequence_input)
average = GlobalAveragePooling1D()(embedded_sequences)
predictions = Dense(N_CLASSES, activation='softmax')(average)
model = Model(sequence_input, predictions)
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
model.fit(x_train, y_train, validation_split=0.1,
nb_epoch=10, batch_size=1)
output_test = model.predict(x_test)
print("test auc:", roc_auc_score(y_test,output_test[:,1]))
`
python neural-network keras nlp
$endgroup$
add a comment |
$begingroup$
I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is:
valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance
`
df = pd.read_csv("data.txt")
df_test = pd.read_csv("validation.txt")
#split arrays into train and test data (cross validation)
train_text, test_text, train_y, test_y =
train_test_split(df,df,test_size = 0.2)
MAX_NB_WORDS = 5700
# get the raw text data
texts_train = train_text.astype(str)
texts_test = test_text.astype(str)
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)
tokenizer.fit_on_texts(texts_train)
sequences = tokenizer.texts_to_sequences(texts_train)
sequences_test = tokenizer.texts_to_sequences(texts_test)
word_index = tokenizer.word_index
type(tokenizer.word_index), len(tokenizer.word_index)
index_to_word = dict((i, w) for w, i in tokenizer.word_index.items())
" ".join([index_to_word[i] for i in sequences[0]])
seq_lens = [len(s) for s in sequences]
MAX_SEQUENCE_LENGTH = 100
# pad sequences with 0s
x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)
#print('Shape of data train:', x_train.shape) #cela a donnée (1,100)
#print('Shape of data test tensor:', x_test.shape)
y_train = train_y
y_test = test_y
print('Shape of label tensor:', y_train.shape)
EMBEDDING_DIM = 32
N_CLASSES = 2
y_train = keras.utils.to_categorical( y_train , N_CLASSES )
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')
embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
trainable=True)
embedded_sequences = embedding_layer(sequence_input)
average = GlobalAveragePooling1D()(embedded_sequences)
predictions = Dense(N_CLASSES, activation='softmax')(average)
model = Model(sequence_input, predictions)
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
model.fit(x_train, y_train, validation_split=0.1,
nb_epoch=10, batch_size=1)
output_test = model.predict(x_test)
print("test auc:", roc_auc_score(y_test,output_test[:,1]))
`
python neural-network keras nlp
$endgroup$
$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago
$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago
$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago
$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago
add a comment |
$begingroup$
I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is:
valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance
`
df = pd.read_csv("data.txt")
df_test = pd.read_csv("validation.txt")
#split arrays into train and test data (cross validation)
train_text, test_text, train_y, test_y =
train_test_split(df,df,test_size = 0.2)
MAX_NB_WORDS = 5700
# get the raw text data
texts_train = train_text.astype(str)
texts_test = test_text.astype(str)
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)
tokenizer.fit_on_texts(texts_train)
sequences = tokenizer.texts_to_sequences(texts_train)
sequences_test = tokenizer.texts_to_sequences(texts_test)
word_index = tokenizer.word_index
type(tokenizer.word_index), len(tokenizer.word_index)
index_to_word = dict((i, w) for w, i in tokenizer.word_index.items())
" ".join([index_to_word[i] for i in sequences[0]])
seq_lens = [len(s) for s in sequences]
MAX_SEQUENCE_LENGTH = 100
# pad sequences with 0s
x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)
#print('Shape of data train:', x_train.shape) #cela a donnée (1,100)
#print('Shape of data test tensor:', x_test.shape)
y_train = train_y
y_test = test_y
print('Shape of label tensor:', y_train.shape)
EMBEDDING_DIM = 32
N_CLASSES = 2
y_train = keras.utils.to_categorical( y_train , N_CLASSES )
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')
embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
trainable=True)
embedded_sequences = embedding_layer(sequence_input)
average = GlobalAveragePooling1D()(embedded_sequences)
predictions = Dense(N_CLASSES, activation='softmax')(average)
model = Model(sequence_input, predictions)
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
model.fit(x_train, y_train, validation_split=0.1,
nb_epoch=10, batch_size=1)
output_test = model.predict(x_test)
print("test auc:", roc_auc_score(y_test,output_test[:,1]))
`
python neural-network keras nlp
$endgroup$
I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is:
valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance
`
df = pd.read_csv("data.txt")
df_test = pd.read_csv("validation.txt")
#split arrays into train and test data (cross validation)
train_text, test_text, train_y, test_y =
train_test_split(df,df,test_size = 0.2)
MAX_NB_WORDS = 5700
# get the raw text data
texts_train = train_text.astype(str)
texts_test = test_text.astype(str)
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)
tokenizer.fit_on_texts(texts_train)
sequences = tokenizer.texts_to_sequences(texts_train)
sequences_test = tokenizer.texts_to_sequences(texts_test)
word_index = tokenizer.word_index
type(tokenizer.word_index), len(tokenizer.word_index)
index_to_word = dict((i, w) for w, i in tokenizer.word_index.items())
" ".join([index_to_word[i] for i in sequences[0]])
seq_lens = [len(s) for s in sequences]
MAX_SEQUENCE_LENGTH = 100
# pad sequences with 0s
x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)
#print('Shape of data train:', x_train.shape) #cela a donnée (1,100)
#print('Shape of data test tensor:', x_test.shape)
y_train = train_y
y_test = test_y
print('Shape of label tensor:', y_train.shape)
EMBEDDING_DIM = 32
N_CLASSES = 2
y_train = keras.utils.to_categorical( y_train , N_CLASSES )
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')
embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
trainable=True)
embedded_sequences = embedding_layer(sequence_input)
average = GlobalAveragePooling1D()(embedded_sequences)
predictions = Dense(N_CLASSES, activation='softmax')(average)
model = Model(sequence_input, predictions)
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
model.fit(x_train, y_train, validation_split=0.1,
nb_epoch=10, batch_size=1)
output_test = model.predict(x_test)
print("test auc:", roc_auc_score(y_test,output_test[:,1]))
`
python neural-network keras nlp
python neural-network keras nlp
asked yesterday
KikioKikio
264
264
$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago
$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago
$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago
$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago
add a comment |
$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago
$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago
$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago
$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago
$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago
$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago
$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago
$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago
$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago
$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago
$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago
$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45971%2ferror-valueerror-input-arrays-should-have-the-same-number-of-samples-as-target%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45971%2ferror-valueerror-input-arrays-should-have-the-same-number-of-samples-as-target%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago
$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago
$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago
$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago