Error: valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples...












1












$begingroup$


I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is:



valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance



`



   df = pd.read_csv("data.txt") 
df_test = pd.read_csv("validation.txt")
#split arrays into train and test data (cross validation)
train_text, test_text, train_y, test_y =
train_test_split(df,df,test_size = 0.2)
MAX_NB_WORDS = 5700

# get the raw text data
texts_train = train_text.astype(str)
texts_test = test_text.astype(str)
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)
tokenizer.fit_on_texts(texts_train)
sequences = tokenizer.texts_to_sequences(texts_train)
sequences_test = tokenizer.texts_to_sequences(texts_test)

word_index = tokenizer.word_index
type(tokenizer.word_index), len(tokenizer.word_index)
index_to_word = dict((i, w) for w, i in tokenizer.word_index.items())
" ".join([index_to_word[i] for i in sequences[0]])
seq_lens = [len(s) for s in sequences]

MAX_SEQUENCE_LENGTH = 100
# pad sequences with 0s
x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)
#print('Shape of data train:', x_train.shape) #cela a donnée (1,100)
#print('Shape of data test tensor:', x_test.shape)
y_train = train_y
y_test = test_y
print('Shape of label tensor:', y_train.shape)
EMBEDDING_DIM = 32
N_CLASSES = 2

y_train = keras.utils.to_categorical( y_train , N_CLASSES )
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')

embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
trainable=True)
embedded_sequences = embedding_layer(sequence_input)

average = GlobalAveragePooling1D()(embedded_sequences)
predictions = Dense(N_CLASSES, activation='softmax')(average)

model = Model(sequence_input, predictions)
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
model.fit(x_train, y_train, validation_split=0.1,
nb_epoch=10, batch_size=1)
output_test = model.predict(x_test)
print("test auc:", roc_auc_score(y_test,output_test[:,1]))


`










share|improve this question









$endgroup$












  • $begingroup$
    Print the output shapes of the feature and the label arrays. Attach the output with the question.
    $endgroup$
    – Shubham Panchal
    23 hours ago










  • $begingroup$
    shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
    $endgroup$
    – Kikio
    20 hours ago












  • $begingroup$
    when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
    $endgroup$
    – Kikio
    20 hours ago










  • $begingroup$
    Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
    $endgroup$
    – Shubham Panchal
    20 hours ago
















1












$begingroup$


I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is:



valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance



`



   df = pd.read_csv("data.txt") 
df_test = pd.read_csv("validation.txt")
#split arrays into train and test data (cross validation)
train_text, test_text, train_y, test_y =
train_test_split(df,df,test_size = 0.2)
MAX_NB_WORDS = 5700

# get the raw text data
texts_train = train_text.astype(str)
texts_test = test_text.astype(str)
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)
tokenizer.fit_on_texts(texts_train)
sequences = tokenizer.texts_to_sequences(texts_train)
sequences_test = tokenizer.texts_to_sequences(texts_test)

word_index = tokenizer.word_index
type(tokenizer.word_index), len(tokenizer.word_index)
index_to_word = dict((i, w) for w, i in tokenizer.word_index.items())
" ".join([index_to_word[i] for i in sequences[0]])
seq_lens = [len(s) for s in sequences]

MAX_SEQUENCE_LENGTH = 100
# pad sequences with 0s
x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)
#print('Shape of data train:', x_train.shape) #cela a donnée (1,100)
#print('Shape of data test tensor:', x_test.shape)
y_train = train_y
y_test = test_y
print('Shape of label tensor:', y_train.shape)
EMBEDDING_DIM = 32
N_CLASSES = 2

y_train = keras.utils.to_categorical( y_train , N_CLASSES )
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')

embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
trainable=True)
embedded_sequences = embedding_layer(sequence_input)

average = GlobalAveragePooling1D()(embedded_sequences)
predictions = Dense(N_CLASSES, activation='softmax')(average)

model = Model(sequence_input, predictions)
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
model.fit(x_train, y_train, validation_split=0.1,
nb_epoch=10, batch_size=1)
output_test = model.predict(x_test)
print("test auc:", roc_auc_score(y_test,output_test[:,1]))


`










share|improve this question









$endgroup$












  • $begingroup$
    Print the output shapes of the feature and the label arrays. Attach the output with the question.
    $endgroup$
    – Shubham Panchal
    23 hours ago










  • $begingroup$
    shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
    $endgroup$
    – Kikio
    20 hours ago












  • $begingroup$
    when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
    $endgroup$
    – Kikio
    20 hours ago










  • $begingroup$
    Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
    $endgroup$
    – Shubham Panchal
    20 hours ago














1












1








1





$begingroup$


I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is:



valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance



`



   df = pd.read_csv("data.txt") 
df_test = pd.read_csv("validation.txt")
#split arrays into train and test data (cross validation)
train_text, test_text, train_y, test_y =
train_test_split(df,df,test_size = 0.2)
MAX_NB_WORDS = 5700

# get the raw text data
texts_train = train_text.astype(str)
texts_test = test_text.astype(str)
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)
tokenizer.fit_on_texts(texts_train)
sequences = tokenizer.texts_to_sequences(texts_train)
sequences_test = tokenizer.texts_to_sequences(texts_test)

word_index = tokenizer.word_index
type(tokenizer.word_index), len(tokenizer.word_index)
index_to_word = dict((i, w) for w, i in tokenizer.word_index.items())
" ".join([index_to_word[i] for i in sequences[0]])
seq_lens = [len(s) for s in sequences]

MAX_SEQUENCE_LENGTH = 100
# pad sequences with 0s
x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)
#print('Shape of data train:', x_train.shape) #cela a donnée (1,100)
#print('Shape of data test tensor:', x_test.shape)
y_train = train_y
y_test = test_y
print('Shape of label tensor:', y_train.shape)
EMBEDDING_DIM = 32
N_CLASSES = 2

y_train = keras.utils.to_categorical( y_train , N_CLASSES )
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')

embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
trainable=True)
embedded_sequences = embedding_layer(sequence_input)

average = GlobalAveragePooling1D()(embedded_sequences)
predictions = Dense(N_CLASSES, activation='softmax')(average)

model = Model(sequence_input, predictions)
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
model.fit(x_train, y_train, validation_split=0.1,
nb_epoch=10, batch_size=1)
output_test = model.predict(x_test)
print("test auc:", roc_auc_score(y_test,output_test[:,1]))


`










share|improve this question









$endgroup$




I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is:



valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance



`



   df = pd.read_csv("data.txt") 
df_test = pd.read_csv("validation.txt")
#split arrays into train and test data (cross validation)
train_text, test_text, train_y, test_y =
train_test_split(df,df,test_size = 0.2)
MAX_NB_WORDS = 5700

# get the raw text data
texts_train = train_text.astype(str)
texts_test = test_text.astype(str)
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)
tokenizer.fit_on_texts(texts_train)
sequences = tokenizer.texts_to_sequences(texts_train)
sequences_test = tokenizer.texts_to_sequences(texts_test)

word_index = tokenizer.word_index
type(tokenizer.word_index), len(tokenizer.word_index)
index_to_word = dict((i, w) for w, i in tokenizer.word_index.items())
" ".join([index_to_word[i] for i in sequences[0]])
seq_lens = [len(s) for s in sequences]

MAX_SEQUENCE_LENGTH = 100
# pad sequences with 0s
x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)
#print('Shape of data train:', x_train.shape) #cela a donnée (1,100)
#print('Shape of data test tensor:', x_test.shape)
y_train = train_y
y_test = test_y
print('Shape of label tensor:', y_train.shape)
EMBEDDING_DIM = 32
N_CLASSES = 2

y_train = keras.utils.to_categorical( y_train , N_CLASSES )
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')

embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
trainable=True)
embedded_sequences = embedding_layer(sequence_input)

average = GlobalAveragePooling1D()(embedded_sequences)
predictions = Dense(N_CLASSES, activation='softmax')(average)

model = Model(sequence_input, predictions)
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
model.fit(x_train, y_train, validation_split=0.1,
nb_epoch=10, batch_size=1)
output_test = model.predict(x_test)
print("test auc:", roc_auc_score(y_test,output_test[:,1]))


`







python neural-network keras nlp






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked yesterday









KikioKikio

264




264












  • $begingroup$
    Print the output shapes of the feature and the label arrays. Attach the output with the question.
    $endgroup$
    – Shubham Panchal
    23 hours ago










  • $begingroup$
    shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
    $endgroup$
    – Kikio
    20 hours ago












  • $begingroup$
    when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
    $endgroup$
    – Kikio
    20 hours ago










  • $begingroup$
    Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
    $endgroup$
    – Shubham Panchal
    20 hours ago


















  • $begingroup$
    Print the output shapes of the feature and the label arrays. Attach the output with the question.
    $endgroup$
    – Shubham Panchal
    23 hours ago










  • $begingroup$
    shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
    $endgroup$
    – Kikio
    20 hours ago












  • $begingroup$
    when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
    $endgroup$
    – Kikio
    20 hours ago










  • $begingroup$
    Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
    $endgroup$
    – Shubham Panchal
    20 hours ago
















$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago




$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago












$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago






$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago














$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago




$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago












$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago




$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago










0






active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45971%2ferror-valueerror-input-arrays-should-have-the-same-number-of-samples-as-target%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45971%2ferror-valueerror-input-arrays-should-have-the-same-number-of-samples-as-target%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to label and detect the document text images

Tabula Rosettana

Aureus (color)