What are the performance impacts of 'functional' Rust?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am following the Rust track on Exercism.io. I have a fair amount of C/C++ experience. I like the 'functional' elements of Rust but I'm concerned about the relative performance.
I solved the 'run length encoding' problem:
pub fn encode(source: &str) -> String {
let mut retval = String::new();
let firstchar = source.chars().next();
let mut currentchar = match firstchar {
Some(x) => x,
None => return retval,
};
let mut currentcharcount: u32 = 0;
for c in source.chars() {
if c == currentchar {
currentcharcount += 1;
} else {
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
currentchar = c;
currentcharcount = 1;
}
}
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
retval
}
I noticed that one of the top-rated answers looked more like this:
extern crate itertools;
use itertools::Itertools;
pub fn encode(data: &str) -> String {
data.chars()
.group_by(|&c| c)
.into_iter()
.map(|(c, group)| match group.count() {
1 => c.to_string(),
n => format!("{}{}", n, c),
})
.collect()
}
I love the top rated solution; it is simple, functional, and elegant. This is what they promised me Rust would be all about. Mine on the other hand is gross and full of mutable variables. You can tell I'm used to C++.
My problem is that the functional style has a SIGNIFICANT performance impact. I tested both versions with the same 4MB of random data encoded 1000 times. My imperative solution took under 10 seconds; the functional solution was ~2mins30seconds.
- Why is the functional style so much slower than the imperative style?
- Is there some problem with the functional implementation which is causing such a huge slowdown?
- If I want to write high performance code, should I ever use this functional style?
functional-programming rust imperative-programming
New contributor
add a comment |
I am following the Rust track on Exercism.io. I have a fair amount of C/C++ experience. I like the 'functional' elements of Rust but I'm concerned about the relative performance.
I solved the 'run length encoding' problem:
pub fn encode(source: &str) -> String {
let mut retval = String::new();
let firstchar = source.chars().next();
let mut currentchar = match firstchar {
Some(x) => x,
None => return retval,
};
let mut currentcharcount: u32 = 0;
for c in source.chars() {
if c == currentchar {
currentcharcount += 1;
} else {
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
currentchar = c;
currentcharcount = 1;
}
}
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
retval
}
I noticed that one of the top-rated answers looked more like this:
extern crate itertools;
use itertools::Itertools;
pub fn encode(data: &str) -> String {
data.chars()
.group_by(|&c| c)
.into_iter()
.map(|(c, group)| match group.count() {
1 => c.to_string(),
n => format!("{}{}", n, c),
})
.collect()
}
I love the top rated solution; it is simple, functional, and elegant. This is what they promised me Rust would be all about. Mine on the other hand is gross and full of mutable variables. You can tell I'm used to C++.
My problem is that the functional style has a SIGNIFICANT performance impact. I tested both versions with the same 4MB of random data encoded 1000 times. My imperative solution took under 10 seconds; the functional solution was ~2mins30seconds.
- Why is the functional style so much slower than the imperative style?
- Is there some problem with the functional implementation which is causing such a huge slowdown?
- If I want to write high performance code, should I ever use this functional style?
functional-programming rust imperative-programming
New contributor
The difference looks extremely surprising to me; that's a factor of x15! Have you checked that both implementations yield the same result?
– Matthieu M.
11 hours ago
@MatthieuM. yep, or at least both functions pass all unit tests defined by exercism.
– David Copernicus Bowie
10 hours ago
1
I am thinking that there should be a way to replace themap
step with aflat_map
step, with a special-purpose iterator implementation taking the character and count and outputting the required stream of bytes. Forward encoding the integer is a bit tricky, but not too bad withcount_leading_zeroes
giving a hint of the magnitude (clz(i) * 77 / 256
gives the log 10).
– Matthieu M.
10 hours ago
add a comment |
I am following the Rust track on Exercism.io. I have a fair amount of C/C++ experience. I like the 'functional' elements of Rust but I'm concerned about the relative performance.
I solved the 'run length encoding' problem:
pub fn encode(source: &str) -> String {
let mut retval = String::new();
let firstchar = source.chars().next();
let mut currentchar = match firstchar {
Some(x) => x,
None => return retval,
};
let mut currentcharcount: u32 = 0;
for c in source.chars() {
if c == currentchar {
currentcharcount += 1;
} else {
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
currentchar = c;
currentcharcount = 1;
}
}
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
retval
}
I noticed that one of the top-rated answers looked more like this:
extern crate itertools;
use itertools::Itertools;
pub fn encode(data: &str) -> String {
data.chars()
.group_by(|&c| c)
.into_iter()
.map(|(c, group)| match group.count() {
1 => c.to_string(),
n => format!("{}{}", n, c),
})
.collect()
}
I love the top rated solution; it is simple, functional, and elegant. This is what they promised me Rust would be all about. Mine on the other hand is gross and full of mutable variables. You can tell I'm used to C++.
My problem is that the functional style has a SIGNIFICANT performance impact. I tested both versions with the same 4MB of random data encoded 1000 times. My imperative solution took under 10 seconds; the functional solution was ~2mins30seconds.
- Why is the functional style so much slower than the imperative style?
- Is there some problem with the functional implementation which is causing such a huge slowdown?
- If I want to write high performance code, should I ever use this functional style?
functional-programming rust imperative-programming
New contributor
I am following the Rust track on Exercism.io. I have a fair amount of C/C++ experience. I like the 'functional' elements of Rust but I'm concerned about the relative performance.
I solved the 'run length encoding' problem:
pub fn encode(source: &str) -> String {
let mut retval = String::new();
let firstchar = source.chars().next();
let mut currentchar = match firstchar {
Some(x) => x,
None => return retval,
};
let mut currentcharcount: u32 = 0;
for c in source.chars() {
if c == currentchar {
currentcharcount += 1;
} else {
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
currentchar = c;
currentcharcount = 1;
}
}
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
retval
}
I noticed that one of the top-rated answers looked more like this:
extern crate itertools;
use itertools::Itertools;
pub fn encode(data: &str) -> String {
data.chars()
.group_by(|&c| c)
.into_iter()
.map(|(c, group)| match group.count() {
1 => c.to_string(),
n => format!("{}{}", n, c),
})
.collect()
}
I love the top rated solution; it is simple, functional, and elegant. This is what they promised me Rust would be all about. Mine on the other hand is gross and full of mutable variables. You can tell I'm used to C++.
My problem is that the functional style has a SIGNIFICANT performance impact. I tested both versions with the same 4MB of random data encoded 1000 times. My imperative solution took under 10 seconds; the functional solution was ~2mins30seconds.
- Why is the functional style so much slower than the imperative style?
- Is there some problem with the functional implementation which is causing such a huge slowdown?
- If I want to write high performance code, should I ever use this functional style?
functional-programming rust imperative-programming
functional-programming rust imperative-programming
New contributor
New contributor
edited 10 hours ago
Shepmaster
162k16333478
162k16333478
New contributor
asked 11 hours ago
David Copernicus BowieDavid Copernicus Bowie
383
383
New contributor
New contributor
The difference looks extremely surprising to me; that's a factor of x15! Have you checked that both implementations yield the same result?
– Matthieu M.
11 hours ago
@MatthieuM. yep, or at least both functions pass all unit tests defined by exercism.
– David Copernicus Bowie
10 hours ago
1
I am thinking that there should be a way to replace themap
step with aflat_map
step, with a special-purpose iterator implementation taking the character and count and outputting the required stream of bytes. Forward encoding the integer is a bit tricky, but not too bad withcount_leading_zeroes
giving a hint of the magnitude (clz(i) * 77 / 256
gives the log 10).
– Matthieu M.
10 hours ago
add a comment |
The difference looks extremely surprising to me; that's a factor of x15! Have you checked that both implementations yield the same result?
– Matthieu M.
11 hours ago
@MatthieuM. yep, or at least both functions pass all unit tests defined by exercism.
– David Copernicus Bowie
10 hours ago
1
I am thinking that there should be a way to replace themap
step with aflat_map
step, with a special-purpose iterator implementation taking the character and count and outputting the required stream of bytes. Forward encoding the integer is a bit tricky, but not too bad withcount_leading_zeroes
giving a hint of the magnitude (clz(i) * 77 / 256
gives the log 10).
– Matthieu M.
10 hours ago
The difference looks extremely surprising to me; that's a factor of x15! Have you checked that both implementations yield the same result?
– Matthieu M.
11 hours ago
The difference looks extremely surprising to me; that's a factor of x15! Have you checked that both implementations yield the same result?
– Matthieu M.
11 hours ago
@MatthieuM. yep, or at least both functions pass all unit tests defined by exercism.
– David Copernicus Bowie
10 hours ago
@MatthieuM. yep, or at least both functions pass all unit tests defined by exercism.
– David Copernicus Bowie
10 hours ago
1
1
I am thinking that there should be a way to replace the
map
step with a flat_map
step, with a special-purpose iterator implementation taking the character and count and outputting the required stream of bytes. Forward encoding the integer is a bit tricky, but not too bad with count_leading_zeroes
giving a hint of the magnitude (clz(i) * 77 / 256
gives the log 10).– Matthieu M.
10 hours ago
I am thinking that there should be a way to replace the
map
step with a flat_map
step, with a special-purpose iterator implementation taking the character and count and outputting the required stream of bytes. Forward encoding the integer is a bit tricky, but not too bad with count_leading_zeroes
giving a hint of the magnitude (clz(i) * 77 / 256
gives the log 10).– Matthieu M.
10 hours ago
add a comment |
2 Answers
2
active
oldest
votes
TL;DR
A functional implementation can be faster than your original procedural implementation, in certain cases.
Why is the functional style so much slower than the imperative style? Is there some problem with the functional implementation which is causing such a huge slowdown?
As Matthieu M. already pointed out, the important thing to note is that the algorithm matters. How that algorithm is expressed (procedural, imperative, object-oriented, functional, declarative) generally doesn't matter.
I see two main issues with the functional code:
Allocating numerous strings over and over is inefficient. In the original functional implementation, this is done via
to_string
andformat!
.There's the overhead of using
group_by
, which exists to give a nested iterator, which you don't need just to get the counts.
Using more of itertools (batching
, take_while_ref
, format_with
) brings the two implementations much closer:
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
A benchmark of 1KiB of random alphanumeric data:
encode (procedural) time: [4.9922 us 5.0386 us 5.0940 us]
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
encode (fast) time: [6.6025 us 6.6636 us 6.7371 us]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
And 4MiB of data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [21.082 ms 21.620 ms 22.211 ms]
encode (fast) time: [26.457 ms 27.104 ms 27.882 ms]
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
If you are interested in creating your own iterator, you can mix-and-match the procedural code with more functional code:
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next(); // See footnote 1
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
1 — thanks to Stargateur for pointing out that eagerly getting the first value helps branch prediction.
4MiB of data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [19.888 ms 20.301 ms 20.794 ms]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
encode (tiny) time: [19.150 ms 19.262 ms 19.399 ms]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
I believe this more clearly shows the main fundamental difference between the two implementations: an iterator-based solution is resumable. Every time we call next
, we need to see if there was a previous character that we've read (self.saved
). This adds a branch to the code that isn't there in the procedural code.
On the flip side, the iterator-based solution is more flexible — we can now compose all sorts of transformations on the data, or write directly to a file instead of a String
, etc. The custom iterator can be extended to operate on a generic type instead of char
as well, making it very flexible.
See also:
- How can I add new methods to Iterator?
If I want to write high performance code, should I ever use this functional style?
I would, until benchmarking shows that it's the bottleneck. Then evaluate why it's the bottleneck.
Supporting code
Always got to show your work, right?
benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion}; // 0.2.11
use rle::*;
fn criterion_benchmark(c: &mut Criterion) {
let data = rand_data(4 * 1024 * 1024);
c.bench_function("encode (procedural)", {
let data = data.clone();
move |b| b.iter(|| encode_proc(&data))
});
c.bench_function("encode (functional)", {
let data = data.clone();
move |b| b.iter(|| encode_iter(&data))
});
c.bench_function("encode (fast)", {
let data = data.clone();
move |b| b.iter(|| encode_slim(&data))
});
c.bench_function("encode (tiny)", {
let data = data.clone();
move |b| b.iter(|| encode_tiny(&data))
});
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
lib.rs
use itertools::Itertools; // 0.8.0
use rand; // 0.6.5
pub fn rand_data(len: usize) -> String {
use rand::distributions::{Alphanumeric, Distribution};
let mut rng = rand::thread_rng();
Alphanumeric.sample_iter(&mut rng).take(len).collect()
}
pub fn encode_proc(source: &str) -> String {
let mut retval = String::new();
let firstchar = source.chars().next();
let mut currentchar = match firstchar {
Some(x) => x,
None => return retval,
};
let mut currentcharcount: u32 = 0;
for c in source.chars() {
if c == currentchar {
currentcharcount += 1;
} else {
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
currentchar = c;
currentcharcount = 1;
}
}
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
retval
}
pub fn encode_iter(data: &str) -> String {
data.chars()
.group_by(|&c| c)
.into_iter()
.map(|(c, group)| match group.count() {
1 => c.to_string(),
n => format!("{}{}", n, c),
})
.collect()
}
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next();
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn all_the_same() {
let data = rand_data(1024);
let a = encode_proc(&data);
let b = encode_iter(&data);
let c = encode_slim(&data);
let d = encode_tiny(&data);
assert_eq!(a, b);
assert_eq!(a, c);
assert_eq!(a, d);
}
}
Great iterator!
– Matthieu M.
5 hours ago
add a comment |
Let's review the functional implementation!
Memory Allocations
One of the big issues of the functional style proposed here is the map
method which allocates a lot. Every single character is first mapped to a String
, before being collected.
It also uses the format
machinery, which is known to be relatively slow.
Sometimes, people try way too hard to get a "pure" functional solution, instead:
let mut result = String::new();
for (c, group) in &source.chars().group_by(|&c| c) {
let count = group.count();
if count > 1 {
result.push_str(&count.to_string());
}
result.push(c);
}
is about as verbose, yet only allocates when count > 1
just like your solution does and does not use the format
machinery either.
I would expect a significant performance win compared to the full functional solution, while at the same time still leveraging group_by
for extra readability compared to the full imperative solution. Sometimes, you ought to mix and match!
That certainly gives us a speed boost, but it is still around 3x slower than the imperative version (30s rather than 10s in my tests). In fact, even if I only push a constant letter in that for loop it is still about 14s, so around 50% slower than the imperative version. That leads me to believe that group_by is probably not zero cost for this use case. Answer accepted anyway!
– David Copernicus Bowie
9 hours ago
@DavidCopernicusBowie For complex questions like this, you may want to wait before accepting an answer. See Shepmaster's one below =)
– Paul Stenne
8 hours ago
1
How can I append a formatted string to an existing String?
– Shepmaster
8 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
David Copernicus Bowie is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55675093%2fwhat-are-the-performance-impacts-of-functional-rust%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
TL;DR
A functional implementation can be faster than your original procedural implementation, in certain cases.
Why is the functional style so much slower than the imperative style? Is there some problem with the functional implementation which is causing such a huge slowdown?
As Matthieu M. already pointed out, the important thing to note is that the algorithm matters. How that algorithm is expressed (procedural, imperative, object-oriented, functional, declarative) generally doesn't matter.
I see two main issues with the functional code:
Allocating numerous strings over and over is inefficient. In the original functional implementation, this is done via
to_string
andformat!
.There's the overhead of using
group_by
, which exists to give a nested iterator, which you don't need just to get the counts.
Using more of itertools (batching
, take_while_ref
, format_with
) brings the two implementations much closer:
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
A benchmark of 1KiB of random alphanumeric data:
encode (procedural) time: [4.9922 us 5.0386 us 5.0940 us]
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
encode (fast) time: [6.6025 us 6.6636 us 6.7371 us]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
And 4MiB of data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [21.082 ms 21.620 ms 22.211 ms]
encode (fast) time: [26.457 ms 27.104 ms 27.882 ms]
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
If you are interested in creating your own iterator, you can mix-and-match the procedural code with more functional code:
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next(); // See footnote 1
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
1 — thanks to Stargateur for pointing out that eagerly getting the first value helps branch prediction.
4MiB of data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [19.888 ms 20.301 ms 20.794 ms]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
encode (tiny) time: [19.150 ms 19.262 ms 19.399 ms]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
I believe this more clearly shows the main fundamental difference between the two implementations: an iterator-based solution is resumable. Every time we call next
, we need to see if there was a previous character that we've read (self.saved
). This adds a branch to the code that isn't there in the procedural code.
On the flip side, the iterator-based solution is more flexible — we can now compose all sorts of transformations on the data, or write directly to a file instead of a String
, etc. The custom iterator can be extended to operate on a generic type instead of char
as well, making it very flexible.
See also:
- How can I add new methods to Iterator?
If I want to write high performance code, should I ever use this functional style?
I would, until benchmarking shows that it's the bottleneck. Then evaluate why it's the bottleneck.
Supporting code
Always got to show your work, right?
benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion}; // 0.2.11
use rle::*;
fn criterion_benchmark(c: &mut Criterion) {
let data = rand_data(4 * 1024 * 1024);
c.bench_function("encode (procedural)", {
let data = data.clone();
move |b| b.iter(|| encode_proc(&data))
});
c.bench_function("encode (functional)", {
let data = data.clone();
move |b| b.iter(|| encode_iter(&data))
});
c.bench_function("encode (fast)", {
let data = data.clone();
move |b| b.iter(|| encode_slim(&data))
});
c.bench_function("encode (tiny)", {
let data = data.clone();
move |b| b.iter(|| encode_tiny(&data))
});
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
lib.rs
use itertools::Itertools; // 0.8.0
use rand; // 0.6.5
pub fn rand_data(len: usize) -> String {
use rand::distributions::{Alphanumeric, Distribution};
let mut rng = rand::thread_rng();
Alphanumeric.sample_iter(&mut rng).take(len).collect()
}
pub fn encode_proc(source: &str) -> String {
let mut retval = String::new();
let firstchar = source.chars().next();
let mut currentchar = match firstchar {
Some(x) => x,
None => return retval,
};
let mut currentcharcount: u32 = 0;
for c in source.chars() {
if c == currentchar {
currentcharcount += 1;
} else {
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
currentchar = c;
currentcharcount = 1;
}
}
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
retval
}
pub fn encode_iter(data: &str) -> String {
data.chars()
.group_by(|&c| c)
.into_iter()
.map(|(c, group)| match group.count() {
1 => c.to_string(),
n => format!("{}{}", n, c),
})
.collect()
}
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next();
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn all_the_same() {
let data = rand_data(1024);
let a = encode_proc(&data);
let b = encode_iter(&data);
let c = encode_slim(&data);
let d = encode_tiny(&data);
assert_eq!(a, b);
assert_eq!(a, c);
assert_eq!(a, d);
}
}
Great iterator!
– Matthieu M.
5 hours ago
add a comment |
TL;DR
A functional implementation can be faster than your original procedural implementation, in certain cases.
Why is the functional style so much slower than the imperative style? Is there some problem with the functional implementation which is causing such a huge slowdown?
As Matthieu M. already pointed out, the important thing to note is that the algorithm matters. How that algorithm is expressed (procedural, imperative, object-oriented, functional, declarative) generally doesn't matter.
I see two main issues with the functional code:
Allocating numerous strings over and over is inefficient. In the original functional implementation, this is done via
to_string
andformat!
.There's the overhead of using
group_by
, which exists to give a nested iterator, which you don't need just to get the counts.
Using more of itertools (batching
, take_while_ref
, format_with
) brings the two implementations much closer:
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
A benchmark of 1KiB of random alphanumeric data:
encode (procedural) time: [4.9922 us 5.0386 us 5.0940 us]
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
encode (fast) time: [6.6025 us 6.6636 us 6.7371 us]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
And 4MiB of data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [21.082 ms 21.620 ms 22.211 ms]
encode (fast) time: [26.457 ms 27.104 ms 27.882 ms]
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
If you are interested in creating your own iterator, you can mix-and-match the procedural code with more functional code:
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next(); // See footnote 1
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
1 — thanks to Stargateur for pointing out that eagerly getting the first value helps branch prediction.
4MiB of data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [19.888 ms 20.301 ms 20.794 ms]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
encode (tiny) time: [19.150 ms 19.262 ms 19.399 ms]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
I believe this more clearly shows the main fundamental difference between the two implementations: an iterator-based solution is resumable. Every time we call next
, we need to see if there was a previous character that we've read (self.saved
). This adds a branch to the code that isn't there in the procedural code.
On the flip side, the iterator-based solution is more flexible — we can now compose all sorts of transformations on the data, or write directly to a file instead of a String
, etc. The custom iterator can be extended to operate on a generic type instead of char
as well, making it very flexible.
See also:
- How can I add new methods to Iterator?
If I want to write high performance code, should I ever use this functional style?
I would, until benchmarking shows that it's the bottleneck. Then evaluate why it's the bottleneck.
Supporting code
Always got to show your work, right?
benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion}; // 0.2.11
use rle::*;
fn criterion_benchmark(c: &mut Criterion) {
let data = rand_data(4 * 1024 * 1024);
c.bench_function("encode (procedural)", {
let data = data.clone();
move |b| b.iter(|| encode_proc(&data))
});
c.bench_function("encode (functional)", {
let data = data.clone();
move |b| b.iter(|| encode_iter(&data))
});
c.bench_function("encode (fast)", {
let data = data.clone();
move |b| b.iter(|| encode_slim(&data))
});
c.bench_function("encode (tiny)", {
let data = data.clone();
move |b| b.iter(|| encode_tiny(&data))
});
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
lib.rs
use itertools::Itertools; // 0.8.0
use rand; // 0.6.5
pub fn rand_data(len: usize) -> String {
use rand::distributions::{Alphanumeric, Distribution};
let mut rng = rand::thread_rng();
Alphanumeric.sample_iter(&mut rng).take(len).collect()
}
pub fn encode_proc(source: &str) -> String {
let mut retval = String::new();
let firstchar = source.chars().next();
let mut currentchar = match firstchar {
Some(x) => x,
None => return retval,
};
let mut currentcharcount: u32 = 0;
for c in source.chars() {
if c == currentchar {
currentcharcount += 1;
} else {
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
currentchar = c;
currentcharcount = 1;
}
}
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
retval
}
pub fn encode_iter(data: &str) -> String {
data.chars()
.group_by(|&c| c)
.into_iter()
.map(|(c, group)| match group.count() {
1 => c.to_string(),
n => format!("{}{}", n, c),
})
.collect()
}
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next();
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn all_the_same() {
let data = rand_data(1024);
let a = encode_proc(&data);
let b = encode_iter(&data);
let c = encode_slim(&data);
let d = encode_tiny(&data);
assert_eq!(a, b);
assert_eq!(a, c);
assert_eq!(a, d);
}
}
Great iterator!
– Matthieu M.
5 hours ago
add a comment |
TL;DR
A functional implementation can be faster than your original procedural implementation, in certain cases.
Why is the functional style so much slower than the imperative style? Is there some problem with the functional implementation which is causing such a huge slowdown?
As Matthieu M. already pointed out, the important thing to note is that the algorithm matters. How that algorithm is expressed (procedural, imperative, object-oriented, functional, declarative) generally doesn't matter.
I see two main issues with the functional code:
Allocating numerous strings over and over is inefficient. In the original functional implementation, this is done via
to_string
andformat!
.There's the overhead of using
group_by
, which exists to give a nested iterator, which you don't need just to get the counts.
Using more of itertools (batching
, take_while_ref
, format_with
) brings the two implementations much closer:
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
A benchmark of 1KiB of random alphanumeric data:
encode (procedural) time: [4.9922 us 5.0386 us 5.0940 us]
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
encode (fast) time: [6.6025 us 6.6636 us 6.7371 us]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
And 4MiB of data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [21.082 ms 21.620 ms 22.211 ms]
encode (fast) time: [26.457 ms 27.104 ms 27.882 ms]
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
If you are interested in creating your own iterator, you can mix-and-match the procedural code with more functional code:
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next(); // See footnote 1
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
1 — thanks to Stargateur for pointing out that eagerly getting the first value helps branch prediction.
4MiB of data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [19.888 ms 20.301 ms 20.794 ms]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
encode (tiny) time: [19.150 ms 19.262 ms 19.399 ms]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
I believe this more clearly shows the main fundamental difference between the two implementations: an iterator-based solution is resumable. Every time we call next
, we need to see if there was a previous character that we've read (self.saved
). This adds a branch to the code that isn't there in the procedural code.
On the flip side, the iterator-based solution is more flexible — we can now compose all sorts of transformations on the data, or write directly to a file instead of a String
, etc. The custom iterator can be extended to operate on a generic type instead of char
as well, making it very flexible.
See also:
- How can I add new methods to Iterator?
If I want to write high performance code, should I ever use this functional style?
I would, until benchmarking shows that it's the bottleneck. Then evaluate why it's the bottleneck.
Supporting code
Always got to show your work, right?
benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion}; // 0.2.11
use rle::*;
fn criterion_benchmark(c: &mut Criterion) {
let data = rand_data(4 * 1024 * 1024);
c.bench_function("encode (procedural)", {
let data = data.clone();
move |b| b.iter(|| encode_proc(&data))
});
c.bench_function("encode (functional)", {
let data = data.clone();
move |b| b.iter(|| encode_iter(&data))
});
c.bench_function("encode (fast)", {
let data = data.clone();
move |b| b.iter(|| encode_slim(&data))
});
c.bench_function("encode (tiny)", {
let data = data.clone();
move |b| b.iter(|| encode_tiny(&data))
});
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
lib.rs
use itertools::Itertools; // 0.8.0
use rand; // 0.6.5
pub fn rand_data(len: usize) -> String {
use rand::distributions::{Alphanumeric, Distribution};
let mut rng = rand::thread_rng();
Alphanumeric.sample_iter(&mut rng).take(len).collect()
}
pub fn encode_proc(source: &str) -> String {
let mut retval = String::new();
let firstchar = source.chars().next();
let mut currentchar = match firstchar {
Some(x) => x,
None => return retval,
};
let mut currentcharcount: u32 = 0;
for c in source.chars() {
if c == currentchar {
currentcharcount += 1;
} else {
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
currentchar = c;
currentcharcount = 1;
}
}
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
retval
}
pub fn encode_iter(data: &str) -> String {
data.chars()
.group_by(|&c| c)
.into_iter()
.map(|(c, group)| match group.count() {
1 => c.to_string(),
n => format!("{}{}", n, c),
})
.collect()
}
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next();
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn all_the_same() {
let data = rand_data(1024);
let a = encode_proc(&data);
let b = encode_iter(&data);
let c = encode_slim(&data);
let d = encode_tiny(&data);
assert_eq!(a, b);
assert_eq!(a, c);
assert_eq!(a, d);
}
}
TL;DR
A functional implementation can be faster than your original procedural implementation, in certain cases.
Why is the functional style so much slower than the imperative style? Is there some problem with the functional implementation which is causing such a huge slowdown?
As Matthieu M. already pointed out, the important thing to note is that the algorithm matters. How that algorithm is expressed (procedural, imperative, object-oriented, functional, declarative) generally doesn't matter.
I see two main issues with the functional code:
Allocating numerous strings over and over is inefficient. In the original functional implementation, this is done via
to_string
andformat!
.There's the overhead of using
group_by
, which exists to give a nested iterator, which you don't need just to get the counts.
Using more of itertools (batching
, take_while_ref
, format_with
) brings the two implementations much closer:
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
A benchmark of 1KiB of random alphanumeric data:
encode (procedural) time: [4.9922 us 5.0386 us 5.0940 us]
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
encode (fast) time: [6.6025 us 6.6636 us 6.7371 us]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
And 4MiB of data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [21.082 ms 21.620 ms 22.211 ms]
encode (fast) time: [26.457 ms 27.104 ms 27.882 ms]
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
If you are interested in creating your own iterator, you can mix-and-match the procedural code with more functional code:
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next(); // See footnote 1
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
1 — thanks to Stargateur for pointing out that eagerly getting the first value helps branch prediction.
4MiB of data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [19.888 ms 20.301 ms 20.794 ms]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
encode (tiny) time: [19.150 ms 19.262 ms 19.399 ms]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
I believe this more clearly shows the main fundamental difference between the two implementations: an iterator-based solution is resumable. Every time we call next
, we need to see if there was a previous character that we've read (self.saved
). This adds a branch to the code that isn't there in the procedural code.
On the flip side, the iterator-based solution is more flexible — we can now compose all sorts of transformations on the data, or write directly to a file instead of a String
, etc. The custom iterator can be extended to operate on a generic type instead of char
as well, making it very flexible.
See also:
- How can I add new methods to Iterator?
If I want to write high performance code, should I ever use this functional style?
I would, until benchmarking shows that it's the bottleneck. Then evaluate why it's the bottleneck.
Supporting code
Always got to show your work, right?
benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion}; // 0.2.11
use rle::*;
fn criterion_benchmark(c: &mut Criterion) {
let data = rand_data(4 * 1024 * 1024);
c.bench_function("encode (procedural)", {
let data = data.clone();
move |b| b.iter(|| encode_proc(&data))
});
c.bench_function("encode (functional)", {
let data = data.clone();
move |b| b.iter(|| encode_iter(&data))
});
c.bench_function("encode (fast)", {
let data = data.clone();
move |b| b.iter(|| encode_slim(&data))
});
c.bench_function("encode (tiny)", {
let data = data.clone();
move |b| b.iter(|| encode_tiny(&data))
});
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
lib.rs
use itertools::Itertools; // 0.8.0
use rand; // 0.6.5
pub fn rand_data(len: usize) -> String {
use rand::distributions::{Alphanumeric, Distribution};
let mut rng = rand::thread_rng();
Alphanumeric.sample_iter(&mut rng).take(len).collect()
}
pub fn encode_proc(source: &str) -> String {
let mut retval = String::new();
let firstchar = source.chars().next();
let mut currentchar = match firstchar {
Some(x) => x,
None => return retval,
};
let mut currentcharcount: u32 = 0;
for c in source.chars() {
if c == currentchar {
currentcharcount += 1;
} else {
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
currentchar = c;
currentcharcount = 1;
}
}
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
retval
}
pub fn encode_iter(data: &str) -> String {
data.chars()
.group_by(|&c| c)
.into_iter()
.map(|(c, group)| match group.count() {
1 => c.to_string(),
n => format!("{}{}", n, c),
})
.collect()
}
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next();
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn all_the_same() {
let data = rand_data(1024);
let a = encode_proc(&data);
let b = encode_iter(&data);
let c = encode_slim(&data);
let d = encode_tiny(&data);
assert_eq!(a, b);
assert_eq!(a, c);
assert_eq!(a, d);
}
}
edited 7 hours ago
answered 8 hours ago
ShepmasterShepmaster
162k16333478
162k16333478
Great iterator!
– Matthieu M.
5 hours ago
add a comment |
Great iterator!
– Matthieu M.
5 hours ago
Great iterator!
– Matthieu M.
5 hours ago
Great iterator!
– Matthieu M.
5 hours ago
add a comment |
Let's review the functional implementation!
Memory Allocations
One of the big issues of the functional style proposed here is the map
method which allocates a lot. Every single character is first mapped to a String
, before being collected.
It also uses the format
machinery, which is known to be relatively slow.
Sometimes, people try way too hard to get a "pure" functional solution, instead:
let mut result = String::new();
for (c, group) in &source.chars().group_by(|&c| c) {
let count = group.count();
if count > 1 {
result.push_str(&count.to_string());
}
result.push(c);
}
is about as verbose, yet only allocates when count > 1
just like your solution does and does not use the format
machinery either.
I would expect a significant performance win compared to the full functional solution, while at the same time still leveraging group_by
for extra readability compared to the full imperative solution. Sometimes, you ought to mix and match!
That certainly gives us a speed boost, but it is still around 3x slower than the imperative version (30s rather than 10s in my tests). In fact, even if I only push a constant letter in that for loop it is still about 14s, so around 50% slower than the imperative version. That leads me to believe that group_by is probably not zero cost for this use case. Answer accepted anyway!
– David Copernicus Bowie
9 hours ago
@DavidCopernicusBowie For complex questions like this, you may want to wait before accepting an answer. See Shepmaster's one below =)
– Paul Stenne
8 hours ago
1
How can I append a formatted string to an existing String?
– Shepmaster
8 hours ago
add a comment |
Let's review the functional implementation!
Memory Allocations
One of the big issues of the functional style proposed here is the map
method which allocates a lot. Every single character is first mapped to a String
, before being collected.
It also uses the format
machinery, which is known to be relatively slow.
Sometimes, people try way too hard to get a "pure" functional solution, instead:
let mut result = String::new();
for (c, group) in &source.chars().group_by(|&c| c) {
let count = group.count();
if count > 1 {
result.push_str(&count.to_string());
}
result.push(c);
}
is about as verbose, yet only allocates when count > 1
just like your solution does and does not use the format
machinery either.
I would expect a significant performance win compared to the full functional solution, while at the same time still leveraging group_by
for extra readability compared to the full imperative solution. Sometimes, you ought to mix and match!
That certainly gives us a speed boost, but it is still around 3x slower than the imperative version (30s rather than 10s in my tests). In fact, even if I only push a constant letter in that for loop it is still about 14s, so around 50% slower than the imperative version. That leads me to believe that group_by is probably not zero cost for this use case. Answer accepted anyway!
– David Copernicus Bowie
9 hours ago
@DavidCopernicusBowie For complex questions like this, you may want to wait before accepting an answer. See Shepmaster's one below =)
– Paul Stenne
8 hours ago
1
How can I append a formatted string to an existing String?
– Shepmaster
8 hours ago
add a comment |
Let's review the functional implementation!
Memory Allocations
One of the big issues of the functional style proposed here is the map
method which allocates a lot. Every single character is first mapped to a String
, before being collected.
It also uses the format
machinery, which is known to be relatively slow.
Sometimes, people try way too hard to get a "pure" functional solution, instead:
let mut result = String::new();
for (c, group) in &source.chars().group_by(|&c| c) {
let count = group.count();
if count > 1 {
result.push_str(&count.to_string());
}
result.push(c);
}
is about as verbose, yet only allocates when count > 1
just like your solution does and does not use the format
machinery either.
I would expect a significant performance win compared to the full functional solution, while at the same time still leveraging group_by
for extra readability compared to the full imperative solution. Sometimes, you ought to mix and match!
Let's review the functional implementation!
Memory Allocations
One of the big issues of the functional style proposed here is the map
method which allocates a lot. Every single character is first mapped to a String
, before being collected.
It also uses the format
machinery, which is known to be relatively slow.
Sometimes, people try way too hard to get a "pure" functional solution, instead:
let mut result = String::new();
for (c, group) in &source.chars().group_by(|&c| c) {
let count = group.count();
if count > 1 {
result.push_str(&count.to_string());
}
result.push(c);
}
is about as verbose, yet only allocates when count > 1
just like your solution does and does not use the format
machinery either.
I would expect a significant performance win compared to the full functional solution, while at the same time still leveraging group_by
for extra readability compared to the full imperative solution. Sometimes, you ought to mix and match!
edited 9 hours ago
Stargateur
9,68542151
9,68542151
answered 10 hours ago
Matthieu M.Matthieu M.
206k29284522
206k29284522
That certainly gives us a speed boost, but it is still around 3x slower than the imperative version (30s rather than 10s in my tests). In fact, even if I only push a constant letter in that for loop it is still about 14s, so around 50% slower than the imperative version. That leads me to believe that group_by is probably not zero cost for this use case. Answer accepted anyway!
– David Copernicus Bowie
9 hours ago
@DavidCopernicusBowie For complex questions like this, you may want to wait before accepting an answer. See Shepmaster's one below =)
– Paul Stenne
8 hours ago
1
How can I append a formatted string to an existing String?
– Shepmaster
8 hours ago
add a comment |
That certainly gives us a speed boost, but it is still around 3x slower than the imperative version (30s rather than 10s in my tests). In fact, even if I only push a constant letter in that for loop it is still about 14s, so around 50% slower than the imperative version. That leads me to believe that group_by is probably not zero cost for this use case. Answer accepted anyway!
– David Copernicus Bowie
9 hours ago
@DavidCopernicusBowie For complex questions like this, you may want to wait before accepting an answer. See Shepmaster's one below =)
– Paul Stenne
8 hours ago
1
How can I append a formatted string to an existing String?
– Shepmaster
8 hours ago
That certainly gives us a speed boost, but it is still around 3x slower than the imperative version (30s rather than 10s in my tests). In fact, even if I only push a constant letter in that for loop it is still about 14s, so around 50% slower than the imperative version. That leads me to believe that group_by is probably not zero cost for this use case. Answer accepted anyway!
– David Copernicus Bowie
9 hours ago
That certainly gives us a speed boost, but it is still around 3x slower than the imperative version (30s rather than 10s in my tests). In fact, even if I only push a constant letter in that for loop it is still about 14s, so around 50% slower than the imperative version. That leads me to believe that group_by is probably not zero cost for this use case. Answer accepted anyway!
– David Copernicus Bowie
9 hours ago
@DavidCopernicusBowie For complex questions like this, you may want to wait before accepting an answer. See Shepmaster's one below =)
– Paul Stenne
8 hours ago
@DavidCopernicusBowie For complex questions like this, you may want to wait before accepting an answer. See Shepmaster's one below =)
– Paul Stenne
8 hours ago
1
1
How can I append a formatted string to an existing String?
– Shepmaster
8 hours ago
How can I append a formatted string to an existing String?
– Shepmaster
8 hours ago
add a comment |
David Copernicus Bowie is a new contributor. Be nice, and check out our Code of Conduct.
David Copernicus Bowie is a new contributor. Be nice, and check out our Code of Conduct.
David Copernicus Bowie is a new contributor. Be nice, and check out our Code of Conduct.
David Copernicus Bowie is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55675093%2fwhat-are-the-performance-impacts-of-functional-rust%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The difference looks extremely surprising to me; that's a factor of x15! Have you checked that both implementations yield the same result?
– Matthieu M.
11 hours ago
@MatthieuM. yep, or at least both functions pass all unit tests defined by exercism.
– David Copernicus Bowie
10 hours ago
1
I am thinking that there should be a way to replace the
map
step with aflat_map
step, with a special-purpose iterator implementation taking the character and count and outputting the required stream of bytes. Forward encoding the integer is a bit tricky, but not too bad withcount_leading_zeroes
giving a hint of the magnitude (clz(i) * 77 / 256
gives the log 10).– Matthieu M.
10 hours ago