Improve results using user input
$begingroup$
I've developed a tool that retrieve the closest expressions from a database based on what the user typed. (using word embedding - a comparison is made between each expression from the database and the user input)
n-result are retrieved but the closest expressions are not necessarily the most relevant one.
For example, by typing : hospital machine
The top results will be "dialysis machine", "medical machine", ... but I'll also find expressions like "building machine", "office machine"
A user will most likely choose medicine related machine.
Is there a way to optimize my ranking system based on the user input while keeping this similarity between vectors of the expression ?
machine-learning word-embeddings ranking
New contributor
Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
I've developed a tool that retrieve the closest expressions from a database based on what the user typed. (using word embedding - a comparison is made between each expression from the database and the user input)
n-result are retrieved but the closest expressions are not necessarily the most relevant one.
For example, by typing : hospital machine
The top results will be "dialysis machine", "medical machine", ... but I'll also find expressions like "building machine", "office machine"
A user will most likely choose medicine related machine.
Is there a way to optimize my ranking system based on the user input while keeping this similarity between vectors of the expression ?
machine-learning word-embeddings ranking
New contributor
Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
$begingroup$
Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
$endgroup$
– Alex L
yesterday
$begingroup$
Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
$endgroup$
– Martin
yesterday
add a comment |
$begingroup$
I've developed a tool that retrieve the closest expressions from a database based on what the user typed. (using word embedding - a comparison is made between each expression from the database and the user input)
n-result are retrieved but the closest expressions are not necessarily the most relevant one.
For example, by typing : hospital machine
The top results will be "dialysis machine", "medical machine", ... but I'll also find expressions like "building machine", "office machine"
A user will most likely choose medicine related machine.
Is there a way to optimize my ranking system based on the user input while keeping this similarity between vectors of the expression ?
machine-learning word-embeddings ranking
New contributor
Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
I've developed a tool that retrieve the closest expressions from a database based on what the user typed. (using word embedding - a comparison is made between each expression from the database and the user input)
n-result are retrieved but the closest expressions are not necessarily the most relevant one.
For example, by typing : hospital machine
The top results will be "dialysis machine", "medical machine", ... but I'll also find expressions like "building machine", "office machine"
A user will most likely choose medicine related machine.
Is there a way to optimize my ranking system based on the user input while keeping this similarity between vectors of the expression ?
machine-learning word-embeddings ranking
machine-learning word-embeddings ranking
New contributor
Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 2 days ago
MartinMartin
62
62
New contributor
Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$begingroup$
Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
$endgroup$
– Alex L
yesterday
$begingroup$
Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
$endgroup$
– Martin
yesterday
add a comment |
$begingroup$
Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
$endgroup$
– Alex L
yesterday
$begingroup$
Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
$endgroup$
– Martin
yesterday
$begingroup$
Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
$endgroup$
– Alex L
yesterday
$begingroup$
Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
$endgroup$
– Alex L
yesterday
$begingroup$
Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
$endgroup$
– Martin
yesterday
$begingroup$
Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
$endgroup$
– Martin
yesterday
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Understanding similarity between two phrases has two aspects
- How similar are the unique tokens in the phrases ?
- How much should the individual tokens contribute to the overall phrase similarity?
To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.
$endgroup$
$begingroup$
Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
$endgroup$
– Martin
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Martin is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47221%2fimprove-results-using-user-input%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Understanding similarity between two phrases has two aspects
- How similar are the unique tokens in the phrases ?
- How much should the individual tokens contribute to the overall phrase similarity?
To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.
$endgroup$
$begingroup$
Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
$endgroup$
– Martin
yesterday
add a comment |
$begingroup$
Understanding similarity between two phrases has two aspects
- How similar are the unique tokens in the phrases ?
- How much should the individual tokens contribute to the overall phrase similarity?
To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.
$endgroup$
$begingroup$
Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
$endgroup$
– Martin
yesterday
add a comment |
$begingroup$
Understanding similarity between two phrases has two aspects
- How similar are the unique tokens in the phrases ?
- How much should the individual tokens contribute to the overall phrase similarity?
To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.
$endgroup$
Understanding similarity between two phrases has two aspects
- How similar are the unique tokens in the phrases ?
- How much should the individual tokens contribute to the overall phrase similarity?
To answer 1, you can use vector similarity which can give you high similarity for tokens similar in meaning. To answer 2, you should look at giving importance/weights to the tokens. You can use a measure like tf-idf. While comparing hospital machine and building machine, machine being a frequent word in your corpus should get a lower score and hence would contribute lesser to the overall similarity. Most of the similarity would be then determined by the similarity between hospital and building which would solve your issue.
answered 2 days ago
Gyan RanjanGyan Ranjan
3307
3307
$begingroup$
Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
$endgroup$
– Martin
yesterday
add a comment |
$begingroup$
Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
$endgroup$
– Martin
yesterday
$begingroup$
Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
$endgroup$
– Martin
yesterday
$begingroup$
Okay, thank you! I'll have a look on these measure (tf-idf) for answer 2. I also want user input (it can be through click or order selection or a mark) to influence the result. The ranking should offer the most similar expression but also the most "selected". I wonder if it's possible to do so ?
$endgroup$
– Martin
yesterday
add a comment |
Martin is a new contributor. Be nice, and check out our Code of Conduct.
Martin is a new contributor. Be nice, and check out our Code of Conduct.
Martin is a new contributor. Be nice, and check out our Code of Conduct.
Martin is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47221%2fimprove-results-using-user-input%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Are you asking about improving your tool online (updating with every addition of new data)? It seems like you want to track clicks to build a belief about what's relevant, no?
$endgroup$
– Alex L
yesterday
$begingroup$
Yes, using clicks, for example, i'd like to reinforce the relevance of the result. If my current system ranks Expression A as the best, but the second best result Expression B is always selected before (or more often) than Expression A, then Expression B must become the first result. However, I'd like to keep the similarity between the vectors of the expression to do the ranking. The click / relevance by user will be an improvement of the current system. Not sure if I made myself understandable, I'm kinda new in the domain
$endgroup$
– Martin
yesterday