Error: valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples...

I'm trying to do task for system calls classification. The code bellow is inspired from a text classification project. My system calls are represented as sequences of integers between 1 and 340. The error I got is:

valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance

   df = pd.read_csv("data.txt") 

   df_test = pd.read_csv("validation.txt")

   #split arrays into train and test data (cross validation)

    train_text, test_text, train_y, test_y = 

   train_test_split(df,df,test_size = 0.2)

   MAX_NB_WORDS = 5700



   # get the raw text data

   texts_train = train_text.astype(str)

   texts_test = test_text.astype(str)

   # finally, vectorize the text samples into a 2D integer tensor

   tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)

   tokenizer.fit_on_texts(texts_train)

   sequences = tokenizer.texts_to_sequences(texts_train)

   sequences_test = tokenizer.texts_to_sequences(texts_test)



   word_index = tokenizer.word_index

   type(tokenizer.word_index), len(tokenizer.word_index)

   index_to_word = dict((i, w) for w, i in tokenizer.word_index.items()) 

    " ".join([index_to_word[i] for i in sequences[0]])

    seq_lens = [len(s) for s in sequences]



    MAX_SEQUENCE_LENGTH = 100

    # pad sequences with 0s

    x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH) 

    x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)

    #print('Shape of data train:', x_train.shape)  #cela a donnée (1,100)

    #print('Shape of data test tensor:', x_test.shape)

    y_train = train_y

    y_test = test_y

    print('Shape of label tensor:', y_train.shape)

    EMBEDDING_DIM = 32

    N_CLASSES = 2



    y_train = keras.utils.to_categorical( y_train , N_CLASSES )

    sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')



    embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,

                    input_length=MAX_SEQUENCE_LENGTH,

                    trainable=True)

    embedded_sequences = embedding_layer(sequence_input)



    average = GlobalAveragePooling1D()(embedded_sequences)

    predictions = Dense(N_CLASSES, activation='softmax')(average)



    model = Model(sequence_input, predictions)

    model.compile(loss='categorical_crossentropy',

      optimizer='adam', metrics=['acc'])

    model.fit(x_train, y_train, validation_split=0.1,

    nb_epoch=10, batch_size=1)

    output_test = model.predict(x_test)

    print("test auc:", roc_auc_score(y_test,output_test[:,1]))

asked yesterday

Kikio

264

$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago

$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago

$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago

$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago

add a comment |

valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance

   df = pd.read_csv("data.txt") 

   df_test = pd.read_csv("validation.txt")

   #split arrays into train and test data (cross validation)

    train_text, test_text, train_y, test_y = 

   train_test_split(df,df,test_size = 0.2)

   MAX_NB_WORDS = 5700



   # get the raw text data

   texts_train = train_text.astype(str)

   texts_test = test_text.astype(str)

   # finally, vectorize the text samples into a 2D integer tensor

   tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)

   tokenizer.fit_on_texts(texts_train)

   sequences = tokenizer.texts_to_sequences(texts_train)

   sequences_test = tokenizer.texts_to_sequences(texts_test)



   word_index = tokenizer.word_index

   type(tokenizer.word_index), len(tokenizer.word_index)

   index_to_word = dict((i, w) for w, i in tokenizer.word_index.items()) 

    " ".join([index_to_word[i] for i in sequences[0]])

    seq_lens = [len(s) for s in sequences]



    MAX_SEQUENCE_LENGTH = 100

    # pad sequences with 0s

    x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH) 

    x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)

    #print('Shape of data train:', x_train.shape)  #cela a donnée (1,100)

    #print('Shape of data test tensor:', x_test.shape)

    y_train = train_y

    y_test = test_y

    print('Shape of label tensor:', y_train.shape)

    EMBEDDING_DIM = 32

    N_CLASSES = 2



    y_train = keras.utils.to_categorical( y_train , N_CLASSES )

    sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')



    embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,

                    input_length=MAX_SEQUENCE_LENGTH,

                    trainable=True)

    embedded_sequences = embedding_layer(sequence_input)



    average = GlobalAveragePooling1D()(embedded_sequences)

    predictions = Dense(N_CLASSES, activation='softmax')(average)



    model = Model(sequence_input, predictions)

    model.compile(loss='categorical_crossentropy',

      optimizer='adam', metrics=['acc'])

    model.fit(x_train, y_train, validation_split=0.1,

    nb_epoch=10, batch_size=1)

    output_test = model.predict(x_test)

    print("test auc:", roc_auc_score(y_test,output_test[:,1]))

asked yesterday

Kikio

264

$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago

$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago

$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago

$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago

add a comment |

valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance

   df = pd.read_csv("data.txt") 

   df_test = pd.read_csv("validation.txt")

   #split arrays into train and test data (cross validation)

    train_text, test_text, train_y, test_y = 

   train_test_split(df,df,test_size = 0.2)

   MAX_NB_WORDS = 5700



   # get the raw text data

   texts_train = train_text.astype(str)

   texts_test = test_text.astype(str)

   # finally, vectorize the text samples into a 2D integer tensor

   tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)

   tokenizer.fit_on_texts(texts_train)

   sequences = tokenizer.texts_to_sequences(texts_train)

   sequences_test = tokenizer.texts_to_sequences(texts_test)



   word_index = tokenizer.word_index

   type(tokenizer.word_index), len(tokenizer.word_index)

   index_to_word = dict((i, w) for w, i in tokenizer.word_index.items()) 

    " ".join([index_to_word[i] for i in sequences[0]])

    seq_lens = [len(s) for s in sequences]



    MAX_SEQUENCE_LENGTH = 100

    # pad sequences with 0s

    x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH) 

    x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)

    #print('Shape of data train:', x_train.shape)  #cela a donnée (1,100)

    #print('Shape of data test tensor:', x_test.shape)

    y_train = train_y

    y_test = test_y

    print('Shape of label tensor:', y_train.shape)

    EMBEDDING_DIM = 32

    N_CLASSES = 2



    y_train = keras.utils.to_categorical( y_train , N_CLASSES )

    sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')



    embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,

                    input_length=MAX_SEQUENCE_LENGTH,

                    trainable=True)

    embedded_sequences = embedding_layer(sequence_input)



    average = GlobalAveragePooling1D()(embedded_sequences)

    predictions = Dense(N_CLASSES, activation='softmax')(average)



    model = Model(sequence_input, predictions)

    model.compile(loss='categorical_crossentropy',

      optimizer='adam', metrics=['acc'])

    model.fit(x_train, y_train, validation_split=0.1,

    nb_epoch=10, batch_size=1)

    output_test = model.predict(x_test)

    print("test auc:", roc_auc_score(y_test,output_test[:,1]))

asked yesterday

Kikio

264

valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples. I don't know what to do as it's my first time Thank you in advance

   df = pd.read_csv("data.txt") 

   df_test = pd.read_csv("validation.txt")

   #split arrays into train and test data (cross validation)

    train_text, test_text, train_y, test_y = 

   train_test_split(df,df,test_size = 0.2)

   MAX_NB_WORDS = 5700



   # get the raw text data

   texts_train = train_text.astype(str)

   texts_test = test_text.astype(str)

   # finally, vectorize the text samples into a 2D integer tensor

   tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)

   tokenizer.fit_on_texts(texts_train)

   sequences = tokenizer.texts_to_sequences(texts_train)

   sequences_test = tokenizer.texts_to_sequences(texts_test)



   word_index = tokenizer.word_index

   type(tokenizer.word_index), len(tokenizer.word_index)

   index_to_word = dict((i, w) for w, i in tokenizer.word_index.items()) 

    " ".join([index_to_word[i] for i in sequences[0]])

    seq_lens = [len(s) for s in sequences]



    MAX_SEQUENCE_LENGTH = 100

    # pad sequences with 0s

    x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH) 

    x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)

    #print('Shape of data train:', x_train.shape)  #cela a donnée (1,100)

    #print('Shape of data test tensor:', x_test.shape)

    y_train = train_y

    y_test = test_y

    print('Shape of label tensor:', y_train.shape)

    EMBEDDING_DIM = 32

    N_CLASSES = 2



    y_train = keras.utils.to_categorical( y_train , N_CLASSES )

    sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')



    embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,

                    input_length=MAX_SEQUENCE_LENGTH,

                    trainable=True)

    embedded_sequences = embedding_layer(sequence_input)



    average = GlobalAveragePooling1D()(embedded_sequences)

    predictions = Dense(N_CLASSES, activation='softmax')(average)



    model = Model(sequence_input, predictions)

    model.compile(loss='categorical_crossentropy',

      optimizer='adam', metrics=['acc'])

    model.fit(x_train, y_train, validation_split=0.1,

    nb_epoch=10, batch_size=1)

    output_test = model.predict(x_test)

    print("test auc:", roc_auc_score(y_test,output_test[:,1]))

python neural-network keras nlp

asked yesterday

Kikio

264

asked yesterday

Kikio

264

asked yesterday

Kikio

264

asked yesterday

Kikio

264

asked yesterday

Kikio

264

$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago

$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago

$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago

$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago

add a comment |

$begingroup$
Print the output shapes of the feature and the label arrays. Attach the output with the question.
$endgroup$
– Shubham Panchal
23 hours ago

$begingroup$
shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]
$endgroup$
– Kikio
20 hours ago

$begingroup$
when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.
$endgroup$
– Kikio
20 hours ago

$begingroup$
Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.
$endgroup$
– Shubham Panchal
20 hours ago

Print the output shapes of the feature and the label arrays. Attach the output with the question.

– Shubham Panchal
23 hours ago

shape of label tensor (0,1) y_train.shape[0] = 0 , x_train.shape[0] = 1 X_train display: [1 4 6 7 7 ......] Y_train display : [ ]

– Kikio
20 hours ago

when I remove this line : #y_train = keras.utils.to_categorical( y_train , N_CLASSES ) The error is changed to : ValueError : Error when checking target : expected dence_1 with shape (2,), but got array with shape (1,). Which means , there is again a problm in shapes.

– Kikio
20 hours ago

Don't remove that line. See, 0 samples in y_train and 1 sample in x_train. Both should have 1 number of samples.

– Shubham Panchal
20 hours ago

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45971%2ferror-valueerror-input-arrays-should-have-the-same-number-of-samples-as-target%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk