Proposal
Add the following methods
impl str {
fn substr_range(&self, item: &str) -> Option<Range<usize>>;
}
impl<T> [T] {
fn subslice_range(&self, item: &[T]) -> Option<Range<usize>>;
fn elem_offset(&self, item: &T) -> Option<usize>;
}
NOTE: These methods are completely different in both behavior and functionality from str::find and friends. They do not compare characters/items inside of the str/slices. Instead they utilize pointer arithmetic to find where a subslice/item is located in the str/slice.
Problem statement
This change attempts to fix two distinct issue. The first of which is the inflexibility of slice::split, str::split, str::matches, and related methods. The second is for parse errors.
Problem 1
str::split and its related functions are really convenient, but I find myself having to manually implement them if I want more complex logic.
For instance, lets say that I want to split up a string by commas. Specifically, I want to do something like str::split_inclusive, but I want to include the separators at the beginning instead of the end.
You cannot do this with str::split
let foo = "a,bc,d";
for sub in foo.split(",") {
// Now I have the split up string, but it doesn't have the comma at the beginning :-(
}
Instead, you have to manually do it yourself
let mut foo = "a,bc,d";
while let Some(comma_idx) = foo.get(1..).map(|f| 1+f.find(",").unwrap_or(f.len())) {
let sub = &foo[..comma_idx];
foo = &foo[comma_idx..];
// sub now has leading comma
}
Problem 2
Say I have a function for parsing and a helper function. The Range denotes where in the string the error occurs. We can use that Range to tell the user where the error is if one occurs.
pub fn parse<'a>(in: &'a str) -> Result<usize, Range<usize>> {
// omitted //
let a = parse_helper(&in[x..y])?;
let b = parse_helper(&in[y+1..])?;
// omitted //
}
fn parse_helper<'a>(in: &'a str) -> Result<usize, Range<usize>> { /* omitted */ }
The issue with this is that the range error from parse_helper is relative to the substring passed to it. This means that we have to manually offset the Range inside of parse.
pub fn parse<'a>(in: &'a str) -> Result<usize, Range<usize>> {
// omitted //
let a = parse_helper(&in[x..y]).map_err(|e| e.start + x..e.end + x)?;
let b = parse_helper(&in[y+1..]).map_err(|e| e.start + y+1..e.end + y+1)?;
// omitted //
}
// ... //
We could also pass an offset to parse_helper and use that to adjust the returned Range, but that would just move the complexity to the parse_helper function.
This gets even worse if instead of a Range, we have an enum that contains Ranges. Say
enum ParseError {
Error1(Range<usize>),
Error2(Range<usize>, Range<usize>),
}
Motivating examples or use cases
Split and similar methods
subslice_offset would allow using indices to extend methods like str::split. We can implement the aforementioned inclusive str::split like so:
let foo = "a,bc,d,";
for sub in foo.split(",") {
let idx = foo.substr_range(sub).unwrap();
let sub = &foo[idx.start.saturating_sub(1)..idx.end];
// Now `sub` is "a", ",bc", ",d", and ","
}
Error handling with string input data
We can also use this method to remove complexity from the code described in Problem 2.
pub fn parse<'a>(in: &'a str) -> Result<usize, &'a str> {
// omitted //
let a = parse_helper(&in[x..y])?;
let b = parse_helper(&in[y+1..])?;
// omitted //
}
fn parse_helper<'a>(in: &'a str) -> Result<usize, &'a str> { /* omitted */ }
Instead of returning Ranges, we can return &'a str. Then, if a caller of parse wants to find where the error occurred, they can do
let input = "9+10-21";
match parse(input) {
Ok(val) => { /* omitted */ },
Err(err) => {
display_error(input.substr_range(err).unwrap());
},
Solution sketch
impl str {
fn substr_range(&self, substr: &str) -> Option<Range<usize>> {
self.as_bytes().subslice_range(substr.as_bytes())
}
}
impl<T> [T] {
fn subslice_range(&self, subslice: &[T]) -> Option<Range<usize>> {
if core::mem::size_of::<T>() == 0 {
panic!();
}
let self_start = self.as_ptr() as usize;
let subslice_start = subslice.as_ptr() as usize;
let byte_start = subslice_start.wrapping_sub(self_start);
let start = byte_start / core::mem::size_of::<T>();
let end = start.wrapping_add(subslice.len());
if byte_start % core::mem::size_of::<T>() != 0 {
return None;
}
if start <= self.len() && end <= self.len() {
Some(start..end)
} else {
None
}
}
fn elem_offset(&self, element: &T) -> Option<usize> {
if core::mem::size_of::<T>() == 0 {
panic!();
}
let self_start = self.as_ptr() as usize;
let elem_start = element as *const T as usize;
let byte_offset = elem_start.wrapping_sub(self_start);
let offset = byte_offset / core::mem::size_of::<T>();
if byte_offset % core::mem::size_of::<T>() != 0 {
return None;
}
if offset < self.len() {
Some(offset)
} else {
None
}
}
}
The following code demonstrates the above behavior
let foo = "mississippi".to_owned();
let is = &foo[4..6];
assert_eq!(is, "is");
assert_eq!(foo.substr_offset(is), Some(4..6));
assert_eq!(foo.substr_offset("is"), None);
Alternatives
Links and related work
subslice_offset crate
Old, deprecated str::subslice_offset. (deprecated here).
This was deprecated with str::find being listed as its replacement, but as mentioned before, this has different functionality.
Original subslice_offset PR
rust-lang/rfcs#2796 (an RFC similar to this but only for slices). The PR for this was abandoned because the author did not have time to make suggested changes.
https://github.com/wr7/rfcs/blob/substr_range/text/3648-substr-range.md - An RFC that I wrote for this
https://stackoverflow.com/questions/50781561/how-to-find-the-starting-offset-of-a-string-slice-of-another-string
Proposal
Add the following methods
NOTE: These methods are completely different in both behavior and functionality from
str::findand friends. They do not compare characters/items inside of thestr/slices. Instead they utilize pointer arithmetic to find where a subslice/item is located in thestr/slice.Problem statement
This change attempts to fix two distinct issue. The first of which is the inflexibility of
slice::split,str::split,str::matches, and related methods. The second is for parse errors.Problem 1
str::splitand its related functions are really convenient, but I find myself having to manually implement them if I want more complex logic.For instance, lets say that I want to split up a string by commas. Specifically, I want to do something like
str::split_inclusive, but I want to include the separators at the beginning instead of the end.You cannot do this with
str::splitInstead, you have to manually do it yourself
Problem 2
Say I have a function for parsing and a helper function. The
Rangedenotes where in the string the error occurs. We can use thatRangeto tell the user where the error is if one occurs.The issue with this is that the range error from
parse_helperis relative to the substring passed to it. This means that we have to manually offset theRangeinside ofparse.We could also pass an offset to
parse_helperand use that to adjust the returnedRange, but that would just move the complexity to theparse_helperfunction.This gets even worse if instead of a
Range, we have an enum that containsRanges. SayMotivating examples or use cases
Split and similar methods
subslice_offsetwould allow using indices to extend methods likestr::split. We can implement the aforementioned inclusivestr::splitlike so:Error handling with string input data
We can also use this method to remove complexity from the code described in Problem 2.
Instead of returning
Ranges, we can return&'a str. Then, if a caller ofparsewants to find where the error occurred, they can doSolution sketch
The following code demonstrates the above behavior
Alternatives
Links and related work
subslice_offset crate
Old, deprecated
str::subslice_offset. (deprecated here).This was deprecated with
str::findbeing listed as its replacement, but as mentioned before, this has different functionality.Original
subslice_offsetPRrust-lang/rfcs#2796 (an RFC similar to this but only for slices). The PR for this was abandoned because the author did not have time to make suggested changes.
https://github.com/wr7/rfcs/blob/substr_range/text/3648-substr-range.md - An RFC that I wrote for this
https://stackoverflow.com/questions/50781561/how-to-find-the-starting-offset-of-a-string-slice-of-another-string