Improve results using user input












1












$begingroup$


I've developed a tool that retrieve the closest expressions from a database based on what the user typed. (using word embedding - a comparison is made between each expression from the database and the user input)



n-result are retrieved but the closest expressions are not necessarily the most relevant one.




For example, by typing : hospital machine



The top results will be "dialysis machine", "medical machine", ... but I'll also find expressions like "building machine", "office machine"



A user will most likely choose medicine related machine.




Is there a way to optimize my ranking system based on the user input while keeping this similarity between vectors of the expression ?










share|improve this question







New contributor




Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
    $endgroup$
    – Alex L
    yesterday










  • $begingroup$
    Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
    $endgroup$
    – Martin
    yesterday
















1












$begingroup$


I've developed a tool that retrieve the closest expressions from a database based on what the user typed. (using word embedding - a comparison is made between each expression from the database and the user input)



n-result are retrieved but the closest expressions are not necessarily the most relevant one.




For example, by typing : hospital machine



The top results will be "dialysis machine", "medical machine", ... but I'll also find expressions like "building machine", "office machine"



A user will most likely choose medicine related machine.




Is there a way to optimize my ranking system based on the user input while keeping this similarity between vectors of the expression ?










share|improve this question







New contributor




Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
    $endgroup$
    – Alex L
    yesterday










  • $begingroup$
    Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
    $endgroup$
    – Martin
    yesterday














1












1








1


1



$begingroup$


I've developed a tool that retrieve the closest expressions from a database based on what the user typed. (using word embedding - a comparison is made between each expression from the database and the user input)



n-result are retrieved but the closest expressions are not necessarily the most relevant one.




For example, by typing : hospital machine



The top results will be "dialysis machine", "medical machine", ... but I'll also find expressions like "building machine", "office machine"



A user will most likely choose medicine related machine.




Is there a way to optimize my ranking system based on the user input while keeping this similarity between vectors of the expression ?










share|improve this question







New contributor




Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I've developed a tool that retrieve the closest expressions from a database based on what the user typed. (using word embedding - a comparison is made between each expression from the database and the user input)



n-result are retrieved but the closest expressions are not necessarily the most relevant one.




For example, by typing : hospital machine



The top results will be "dialysis machine", "medical machine", ... but I'll also find expressions like "building machine", "office machine"



A user will most likely choose medicine related machine.




Is there a way to optimize my ranking system based on the user input while keeping this similarity between vectors of the expression ?







machine-learning word-embeddings ranking






share|improve this question







New contributor




Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









MartinMartin

62




62




New contributor




Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
    $endgroup$
    – Alex L
    yesterday










  • $begingroup$
    Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
    $endgroup$
    – Martin
    yesterday


















  • $begingroup$
    Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
    $endgroup$
    – Alex L
    yesterday










  • $begingroup$
    Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
    $endgroup$
    – Martin
    yesterday
















$begingroup$
Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
$endgroup$
– Alex L
yesterday




$begingroup$
Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
$endgroup$
– Alex L
yesterday












$begingroup$
Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
$endgroup$
– Martin
yesterday




$begingroup$
Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
$endgroup$
– Martin
yesterday










1 Answer
1






active

oldest

votes


















0












$begingroup$

Understanding similarity between two phrases has two aspects




  1. How similar are the unique tokens in the phrases ?

  2. How much should the individual tokens contribute to the overall phrase similarity?


To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.






share|improve this answer









$endgroup$













  • $begingroup$
    Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
    $endgroup$
    – Martin
    yesterday













Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Martin is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47221%2fimprove-results-using-user-input%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

Understanding similarity between two phrases has two aspects




  1. How similar are the unique tokens in the phrases ?

  2. How much should the individual tokens contribute to the overall phrase similarity?


To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.






share|improve this answer









$endgroup$













  • $begingroup$
    Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
    $endgroup$
    – Martin
    yesterday


















0












$begingroup$

Understanding similarity between two phrases has two aspects




  1. How similar are the unique tokens in the phrases ?

  2. How much should the individual tokens contribute to the overall phrase similarity?


To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.






share|improve this answer









$endgroup$













  • $begingroup$
    Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
    $endgroup$
    – Martin
    yesterday
















0












0








0





$begingroup$

Understanding similarity between two phrases has two aspects




  1. How similar are the unique tokens in the phrases ?

  2. How much should the individual tokens contribute to the overall phrase similarity?


To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.






share|improve this answer









$endgroup$



Understanding similarity between two phrases has two aspects




  1. How similar are the unique tokens in the phrases ?

  2. How much should the individual tokens contribute to the overall phrase similarity?


To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.







share|improve this answer












share|improve this answer



share|improve this answer










answered 2 days ago









Gyan RanjanGyan Ranjan

3307




3307












  • $begingroup$
    Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
    $endgroup$
    – Martin
    yesterday




















  • $begingroup$
    Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
    $endgroup$
    – Martin
    yesterday


















$begingroup$
Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
$endgroup$
– Martin
yesterday






$begingroup$
Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
$endgroup$
– Martin
yesterday












Martin is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Martin is a new contributor. Be nice, and check out our Code of Conduct.













Martin is a new contributor. Be nice, and check out our Code of Conduct.












Martin is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47221%2fimprove-results-using-user-input%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Callistus I

Tabula Rosettana

How to label and detect the document text images