
Interpolate Spotify audiobook duration from text size
Source:R/interpolate_spotify_audiobook_duration.R
interpolate_spotify_audiobook_duration.RdEstimate Spotify audiobook durations for new chapters or books from a reference data set where Spotify duration is already known. The predictor can be either text file size in bytes or word count. File size is often the simplest option when all chapters are plain text files created by the same workflow.
Usage
interpolate_spotify_audiobook_duration(
reference,
target = NULL,
duration_col,
books_path = NULL,
target_book = NULL,
book_col = "book",
extension = "txt",
reference_book_col = NULL,
size_col = NULL,
words_col = NULL,
file_col = NULL,
text_col = NULL,
measure = c("file_size", "word_count"),
duration_unit = c("seconds", "minutes", "hours", "hms"),
output_unit = c("minutes", "seconds", "hours", "hms"),
method = c("ratio", "lm")
)Arguments
- reference
Data frame with known Spotify durations and, unless
books_pathis supplied, a text-size predictor.- target
Data frame with chapters or books to estimate. When
books_pathandtarget_bookare supplied, this can be left asNULL.- duration_col
Character scalar. Column in
referencecontaining known Spotify duration.- books_path
Character scalar or
NULL. Optional folder containing one subfolder per book, with chapter text files inside each book folder.- target_book
Character vector or
NULL. Book folder name(s) to estimate whenbooks_pathis supplied.- book_col
Character scalar. Book identifier column in
reference.- extension
Character scalar. File extension to read from
books_path.- reference_book_col
Character scalar or
NULL. Optional book identifier inreference. When supplied, the predictor is summed within each book andduration_colmust contain one unique duration per book. Use this when reference rows are chapter-level but Spotify durations are book-level.- size_col
Character scalar or
NULL. Column containing file sizes in bytes. Use this whenmeasure = "file_size".- words_col
Character scalar or
NULL. Column containing word counts. Use this whenmeasure = "word_count".- file_col
Character scalar or
NULL. Column containing paths to text files. If supplied withmeasure = "file_size", file sizes are computed withfile.info(). If supplied withmeasure = "word_count", words are counted from the files.- text_col
Character scalar or
NULL. Column containing text strings to measure directly.- measure
Character scalar. Either
"file_size"or"word_count".- duration_unit
Unit of
duration_col:"seconds","minutes","hours", or"hms"for strings like"6:11:00".- output_unit
Unit for the returned estimate column. Use
"hms"for spreadsheet-friendly strings like"5:56:00".- method
Estimation method.
"ratio"fits a single seconds-per-unit rate through the origin."lm"fits a linear model with an intercept.
Value
A tibble containing target plus .duration_seconds, an
estimated_duration_* column in output_unit, .duration_measure, and
.duration_method. The total estimated duration is also stored in the
estimated_total_seconds and estimated_total_* attributes.
Examples
reference <- tibble::tibble(
book = c("A", "B"),
file_size_bytes = c(100000, 150000),
spotify_duration_minutes = c(120, 180)
)
chapters <- tibble::tibble(
chapter = c("chapter_1", "chapter_2"),
file_size_bytes = c(25000, 50000)
)
interpolate_spotify_audiobook_duration(
reference,
chapters,
duration_col = "spotify_duration_minutes",
size_col = "file_size_bytes",
duration_unit = "minutes"
)
#> # A tibble: 2 × 6
#> chapter file_size_bytes .duration_seconds estimated_duration_minutes
#> <chr> <dbl> <dbl> <dbl>
#> 1 chapter_1 25000 1800 30
#> 2 chapter_2 50000 3600 60
#> # ℹ 2 more variables: .duration_measure <chr>, .duration_method <chr>