How to get the similar sounding words together
I am trying to get all the similar sounding words from a list
I tried to get them using cosine similarity but that does not fulfill my purpose
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seam to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz'] 
where they mean that the words which sound similar
python python-3.x list
New contributor
Marc Stoch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
I am trying to get all the similar sounding words from a list
I tried to get them using cosine similarity but that does not fulfill my purpose
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seam to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz'] 
where they mean that the words which sound similar
python python-3.x list
New contributor
Marc Stoch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
I am trying to get all the similar sounding words from a list
I tried to get them using cosine similarity but that does not fulfill my purpose
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seam to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz'] 
where they mean that the words which sound similar
python python-3.x list
New contributor
Marc Stoch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I am trying to get all the similar sounding words from a list
I tried to get them using cosine similarity but that does not fulfill my purpose
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seam to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz'] 
where they mean that the words which sound similar
python python-3.x list
python python-3.x list
New contributor
Marc Stoch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Marc Stoch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 59 mins ago


DirtyBit
10.2k21640
10.2k21640
New contributor
Marc Stoch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 1 hour ago
Marc StochMarc Stoch
312
312
New contributor
Marc Stoch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Marc Stoch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Marc Stoch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
                                1 Answer
                            1
                        
active
oldest
votes
First you need to use a proper way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def isSoundex(dList):
    res = [soundex(x) for x in dList]   # iterate over each elem in the dataList
    # print(res)     # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
    return res
dataList = ['two','fourth','forth','dessert','to','desert']
res = isSoundex(dataList)
print([x for x in sorted(res)])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
 
 
 
 
 
 
 
 lib link please :)
 
 – Nihal
 1 hour ago
 
 
 
 
 
 1
 
 
 
 
 
 @Nihal updated! :)
 
 – DirtyBit
 57 mins ago
 
 
 
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Marc Stoch is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331723%2fhow-to-get-the-similar-sounding-words-together%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
                                1 Answer
                            1
                        
active
oldest
votes
                                1 Answer
                            1
                        
active
oldest
votes
active
oldest
votes
active
oldest
votes
First you need to use a proper way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def isSoundex(dList):
    res = [soundex(x) for x in dList]   # iterate over each elem in the dataList
    # print(res)     # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
    return res
dataList = ['two','fourth','forth','dessert','to','desert']
res = isSoundex(dataList)
print([x for x in sorted(res)])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
 
 
 
 
 
 
 
 lib link please :)
 
 – Nihal
 1 hour ago
 
 
 
 
 
 1
 
 
 
 
 
 @Nihal updated! :)
 
 – DirtyBit
 57 mins ago
 
 
 
add a comment |
First you need to use a proper way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def isSoundex(dList):
    res = [soundex(x) for x in dList]   # iterate over each elem in the dataList
    # print(res)     # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
    return res
dataList = ['two','fourth','forth','dessert','to','desert']
res = isSoundex(dataList)
print([x for x in sorted(res)])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
 
 
 
 
 
 
 
 lib link please :)
 
 – Nihal
 1 hour ago
 
 
 
 
 
 1
 
 
 
 
 
 @Nihal updated! :)
 
 – DirtyBit
 57 mins ago
 
 
 
add a comment |
First you need to use a proper way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def isSoundex(dList):
    res = [soundex(x) for x in dList]   # iterate over each elem in the dataList
    # print(res)     # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
    return res
dataList = ['two','fourth','forth','dessert','to','desert']
res = isSoundex(dataList)
print([x for x in sorted(res)])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
First you need to use a proper way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def isSoundex(dList):
    res = [soundex(x) for x in dList]   # iterate over each elem in the dataList
    # print(res)     # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
    return res
dataList = ['two','fourth','forth','dessert','to','desert']
res = isSoundex(dataList)
print([x for x in sorted(res)])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
edited 50 mins ago
answered 1 hour ago


DirtyBitDirtyBit
10.2k21640
10.2k21640
 
 
 
 
 
 
 
 lib link please :)
 
 – Nihal
 1 hour ago
 
 
 
 
 
 1
 
 
 
 
 
 @Nihal updated! :)
 
 – DirtyBit
 57 mins ago
 
 
 
add a comment |
 
 
 
 
 
 
 
 lib link please :)
 
 – Nihal
 1 hour ago
 
 
 
 
 
 1
 
 
 
 
 
 @Nihal updated! :)
 
 – DirtyBit
 57 mins ago
 
 
 
lib link please :)
– Nihal
1 hour ago
lib link please :)
– Nihal
1 hour ago
1
1
@Nihal updated! :)
– DirtyBit
57 mins ago
@Nihal updated! :)
– DirtyBit
57 mins ago
add a comment |
Marc Stoch is a new contributor. Be nice, and check out our Code of Conduct.
Marc Stoch is a new contributor. Be nice, and check out our Code of Conduct.
Marc Stoch is a new contributor. Be nice, and check out our Code of Conduct.
Marc Stoch is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331723%2fhow-to-get-the-similar-sounding-words-together%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown