CodeAndRoll

A collection of custom R functions. Works with MarkdownReports, SeuratUtils but also as a standalone set of more than 200 productivity tools.

View on GitHub

DOI If you use these functions, please star the repo, or cite via DOI. Thanks!

CodeAndRoll

CodeAndRoll is a collection of custom R functions. Works with MarkdownReports, SeuratUtils but also as a standalone set of more than 200 productivity tools.. Many other repos/libraries of mine may have dependency on these functions. Source: own work + web (source referenced in description and/or source code). Intended for my personal use, shared because others may find (parts of it) useful.

News

CodeAndRoll (v1) repository is decommissioned.

Use the packages below:

Package Reorganisation Diagram

Install

1.) Download CodeAndRoll.R, save as local .R file, and source(~/path/to/CodeAndRoll.R):

2.) Directly source from the web:

source("https://raw.githubusercontent.com/vertesy/CodeAndRoll/master/CodeAndRoll.R")

Troubleshooting

If you encounter a bug, something doesn’t work or unclear, please let me know by raising an issue on CodeAndRoll – Please check if it has been asked.

Usage

After source("~/path/to/CodeAndRoll.R") you can use any of the functions listed below. A part of the functions have a minimal example written in the .R scripts, just below each functions definition.

Chapters

The script is roughly organised in the following sections / categories:

  1. File handling, export, import [read & write]
  2. Clipboard interaction (OS X)
  3. Reading files in
  4. Writing files out
  5. Vector operations
  6. Vector filtering
  7. Matrix operations
  8. List operations
  9. Set operations
  10. Math and stats
  11. String operations
  12. Plotting and Graphics
  13. Read and write plotting functions READ
  14. Generic
  15. Plots
  16. New additions

List of Functions

Note that this library is under continous development. Thus not all functions here may be still in CodeAndRoll.R, and vice versa, new functions in CodeAndRoll.R may not be listed here. Backward compatibility is most often, but not always taken care of. See other files in the repo if you are missing a function.

String operations

  1. ppp():

    Paste by point

  2. pps():

    Paste by (forward) slash

  3. ppu():

    Paste by underscore

  4. ppd():

    Paste by dash

  5. kpp():

    kollapse by point

  6. kppu():

    kollapse by underscore

  7. kppd():

    kollapse by dash

  8. stry():

    Silent try

  9. say():

    Use system voice to notify (after a long task is done)

  10. sayy():

    Use system voice to notify (after a long task is done)

  11. grepv():

    grep returning the value.

  12. unload():

    Unload a package. Source Stackoverflow.

  13. clip2clip.vector():

    Copy from clipboard (e.g. excel) to a R-formatted vector to the clipboard

  14. clip2clip.commaSepString():

    read a comma separated string (e.g. list of gene names) and properly format it for R.

  15. read.simple.vec():

    Read each line of a file to an element of a vector (read in new-line separated values, no header!).

  16. read.simple():

    It is essentially read.table() with file/path parsing.

  17. read.simple_char_list():

    Read in a file.

  18. read.simple.table():

    Read in a file. default: header defines colnames, no rownames. For rownames give the col nr. with rownames, eg. 1 The header should start with a TAB / First column name should be empty.

  19. FirstCol2RowNames():

    Set First Col to Row Names

File handling, export, import

  1. read.simple.tsv():

    Read in a file with excel style data: rownames in col1, headers SHIFTED. The header should start with a TAB / First column name should be empty.

  2. read.simple.csv():

    Read in a file with excel style data: rownames in col1, headers SHIFTED. The header should start with a TAB / First column name should be empty.

  3. read.simple.ssv():

    Space separeted values. Read in a file with excel style data: rownames in col1, headers SHIFTED. The header should start with a TAB / First column name should be empty.

  4. read.simple.tsv.named.vector():

    Read in a file with excel style named vectors, names in col1, headers SHIFTED. The header should start with a TAB / First column name should be empty.

  5. convert.tsv.data():

    Fix NA issue in dataframes imported by the new read.simple.tsv. Set na_rep to NA if you want to keep NA-s

  6. read.simple.xls():

    Read multi-sheet excel files. row_namePos = NULL for automatic names

  7. write.simple():

    Write out a matrix-like R-object to a file with as tab separated values (.tsv). Your output filename will be either the variable’s name. The output file will be located in “OutDir” specified by you at the beginning of the script, or under your current working directory. You can pass the PATH and VARIABLE separately (in order), they will be concatenated to the filename.

  8. write.simple.vec():

    Write out a vector-like R-object to a file with as newline separated values (.vec). Your output filename will be either the variable’s name. The output file will be located in “OutDir” specified by you at the beginning of the script, or under your current working directory. You can pass the PATH and VARIABLE separately (in order), they will be concatenated to the filename.

  9. write.simple.xlsx():

    Write out a list of matrices/ data frames WITH ROW- AND COLUMN- NAMES to a file with as an Excel (.xslx) file. Your output filename will be either the variable’s name. The output file will be located in “OutDir” specified by you at the beginning of the script, or under your current working directory. You can pass the PATH and VARIABLE separately (in order), they will be concatenated to the filename.

  10. write.simple.append():

    Append an R-object WITHOUT ROWNAMES, to an existing .tsv file of the same number of columns. Your output filename will be either the variable’s name. The output file will be located in “OutDir” specified by you at the beginning of the script, or under your current working directory. You can pass the PATH and VARIABLE separately (in order), they will be concatenated to the filename.

  11. sstrsplit():

    Alias for str_split_fixed in the stringr package

  12. topN.dfCol():

    Find the n highest values in a named vector

  13. bottomN.dfCol():

    Find the n lowest values in a named vector

  14. as.named.vector():

    Convert a dataframe column or row into a vector, keeping the corresponding dimension name.

  15. col2named.vector():

    Convert a dataframe column into a vector, keeping the corresponding dimension name.

  16. row2named.vector():

    Convert a dataframe row into a vector, keeping the corresponding dimension name.

  17. as.numeric.wNames():

    Converts any vector into a numeric vector, and puts the original character values into the names of the new vector, unless it already has names. Useful for coloring a plot by categories, name-tags, etc.

  18. as.numeric.wNames.old():

    Converts any vector into a numeric vector, and puts the original character values into the names of the new vector, unless it already has names. Useful for coloring a plot by categories, name-tags, etc.

  19. as.character.wNames():

    Converts your input vector into a character vector, and puts the original character values into the names of the new vector, unless it already has names.

  20. rescale():

    linear transformation to a given range of values

  21. flip_value2name():

    Flip the values and the names of a vector with names

  22. sortbyitsnames():

    Sort a vector by the alphanumeric order of its names (instead of its values).

  23. any.duplicated():

    How many entries are duplicated

  24. which.duplicated():

    orig =rownames(sc@expdata)

  25. which.NA():

    orig =rownames(sc@expdata)

  26. which_names():

    Return the names where the input vector is TRUE. The input vector is converted to logical.

  27. which_names_grep():

    Return the vector elements whose names are partially matched

  28. na.omit.mat():

    Omit rows with NA values from a matrix. Rows with any, or full of NA-s

  29. inf.omit():

    Omit infinite values from a vector.

  30. zero.omit():

    Omit zero values from a vector.

  31. pc_TRUE():

    Percentage of true values in a logical vector, parsed as text (useful for reports.)

  32. NrAndPc():

    Summary stat. text formatting for logical vectors (%, length)

  33. pc_in_total_of_match():

    Percentage of a certain value within a vector or table.

  34. filter_survival_length():

    Parse a sentence reporting the % of filter survival.

  35. remove_outliers():

    Remove values that fall outside the trailing N % of the distribution.

  36. simplify_categories():

    Replace every entry that is found in “replaceit”, by a single value provided by “to”

  37. rotate():

    rotate a matrix 90 degrees.

  38. sortEachColumn():

    Sort each column of a numeric matrix / data frame.

  39. rowMedians():

    Calculates the median of each row of a numeric matrix / data frame.

  40. colMedians():

    Calculates the median of each column of a numeric matrix / data frame.

  41. rowGeoMeans():

    Calculates the median of each row of a numeric matrix / data frame.

  42. colGeoMeans():

    Calculates the median of each column of a numeric matrix / data frame.

  43. rowCV():

    Calculates the CV of each ROW of a numeric matrix / data frame.

  44. colCV():

    Calculates the CV of each column of a numeric matrix / data frame.

  45. rowVariance():

    Calculates the CV of each ROW of a numeric matrix / data frame.

  46. colVariance():

    Calculates the CV of each column of a numeric matrix / data frame.

  47. rowMin():

    Calculates the minimum of each row of a numeric matrix / data frame.

  48. colMin():

    Calculates the minimum of each column of a numeric matrix / data frame.

  49. rowMax():

    Calculates the maximum of each row of a numeric matrix / data frame.

  50. colMax():

    Calculates the maximum of each column of a numeric matrix / data frame.

  51. rowSEM():

    Calculates the SEM of each row of a numeric matrix / data frame.

  52. colSEM():

    Calculates the SEM of each column of a numeric matrix / data frame.

  53. rowSD():

    Calculates the SEM of each row of a numeric matrix / data frame.

  54. colSD():

    Calculates the SEM of each column of a numeric matrix / data frame.

  55. rowIQR():

    Calculates the SEM of each row of a numeric matrix / data frame.

  56. colIQR():

    Calculates the SEM of each column of a numeric matrix / data frame.

  57. rowquantile():

    Calculates the SEM of each row of a numeric matrix / data frame.

  58. colquantile():

    Calculates the SEM of each column of a numeric matrix / data frame.

  59. row.Zscore():

    Calculate Z-score over rows of data frame.

  60. rowACF():

    RETURNS A LIST. Calculates the autocorrelation of each row of a numeric matrix / data frame.

  61. colACF():

    RETURNS A LIST. Calculates the autocorrelation of each row of a numeric matrix / data frame.

  62. acf.exactLag():

    Autocorrelation with exact lag

  63. rowACF.exactLag():

    RETURNS A Vector for the “lag” based autocorrelation. Calculates the autocorrelation of each row of a numeric matrix / data frame.

  64. colACF.exactLag():

    RETURNS A Vector for the “lag” based autocorrelation. Calculates the autocorrelation of each row of a numeric matrix / data frame.

  65. colDivide():

    divide by column

  66. rowDivide():

    divide by row

  67. sort.mat():

    Sort a matrix. ALTERNATIVE: dd[with(dd, order(-z, b)), ]. Source: stackoverflow.

  68. rowNameMatrix():

    Create a copy of your matrix, where every entry is replaced by the corresponding row name. Useful if you want to color by row name in a plot (where you have different number of NA-values in each row).

  69. colNameMatrix():

    Create a copy of your matrix, where every entry is replaced by the corresponding column name. Useful if you want to color by column name in a plot (where you have different number of NA-values in each column).

  70. colsplit():

    split a data frame by a factor corresponding to columns.

  71. rowsplit():

    split a data frame by a factor corresponding to columns.

  72. TPM_normalize():

    normalize each column to 1 million

  73. median_normalize():

    normalize each column to the median of all the column-sums

  74. mean_normalize():

    normalize each column to the median of the columns

  75. rownames.trimws():

    trim whitespaces from the rownames

  76. select.rows.and.columns():

    Subset rows and columns. It checks if the selected dimension names exist and reports if any of those they aren’t found.

  77. getRows():

    Get the subset of rows with existing rownames, report how much it could not find.

  78. getCols():

    Get the subset of cols with existing colnames, report how much it could not find.

  79. get.oddoreven():

    Get odd or even columns or rows of a data frame

  80. combine.matrices.intersect():

    combine matrices by rownames intersect

  81. merge_dfs_by_rn():

    Merge any data frames by rownames. Required plyr package

  82. merge_numeric_df_by_rn():

    Merge 2 numeric data frames by rownames

  83. attach_w_rownames():

    Take a data frame (of e.g. metadata) from your memory space, split it into vectors so you can directly use them. E.g.: Instead of metadata$color[blabla] use color[blabla]

  84. panel.cor.pearson():

    A function to display correlation values for pairs() function. Default is pearson correlation, that can be set to “kendall” or “spearman”.

  85. panel.cor.spearman():

    A function to display correlation values for pairs() function. Default is pearson correlation, that can be set to “kendall” or “spearman”.

  86. remove.na.rows():

    cols have to be a vector of numbers corresponding to columns

  87. remove.na.cols():

    cols have to be a vector of numbers corresponding to columns

  88. intersect.ls():

    Intersect any number of lists.

  89. union.ls():

    Intersect any number of list elements. Faster than reduce.

  90. unlapply():

    lapply, then unlist

  91. list.wNames():

    create a list with names from ALL variables you pass on to the function

  92. as.list.df.by.row():

    Split a dataframe into a list by its columns. omit.empty for the listelments; na.omit and zero.omit are applied on entries inside each list element.

  93. as.list.df.by.col():

    oSplit a dataframe into a list by its rows. omit.empty for the listelments; na.omit and zero.omit are applied on entries inside each list element.

  94. reorder.list():

    reorder elements of lists in your custom order of names / indices.

  95. range.list():

    range of values in whole list

  96. intermingle2lists():

    Combine 2 lists (of the same length) so that form every odd and every even element of a unified list. Useful for side-by-side comparisons, e.g. in wstripchart_list().

  97. as.listalike():

    convert a vector to a list with certain dimensions, taken from the list it wanna resemble

  98. list2fullDF.byNames():

    Convert a list to a full matrix. Rows = names(union.ls(your_list)) or all names of within list elements, columns = names(your_list).

  99. list2fullDF.presence():

    Convert a list to a full matrix. Designed for occurence counting, think tof table(). Rows = all ENTRIES of within your list, columns = names(your_list).

  100. splitbyitsnames():

    split a list by its names

  101. splititsnames_byValues():

    split a list by its names

  102. intermingle2vec():

    Combine 2 vectors (of the same length) so that form every odd and every even element of a unified vector.

  103. intermingle.cbind():

    Combine 2 data frames (of the same length) so that form every odd and every even element of a unified list. Useful for side-by-side comparisons, e.g. in wstripchart_list().

  104. pad.na():

    Fill up with a vector to a given length with NA-values at the end.

  105. clip.values():

    Signal clipping. Cut values above or below a threshold.

  106. clip.outliers():

    Signal clipping based on the input data’s distribution. It clips values above or below the extreme N% of the distribution.

  107. ls2categvec():

    Convert a list to a vector repeating list-element names, while vector names are the list elements

  108. symdiff():

    Quasy symmetric difference of any number of vectors

  109. sem():

    Calculates the standard error of the mean (SEM) for a numeric vector (it excludes NA-s by default)

  110. fano():

    Calculates the fano factor on a numeric vector (it excludes NA-s by default)

  111. geomean():

    Calculates the geometric mean of a numeric vector (it excludes NA-s by default)

  112. mean_of_log():

    Calculates the mean of the log_k of a numeric vector (it excludes NA-s by default)

  113. movingAve():

    Calculates the moving / rolling average of a numeric vector.

  114. movingAve2():

  115. movingSEM():

    Calculates the moving / rolling standard error of the mean (SEM) on a numeric vector.

  116. imovingSEM():

    Calculates the moving / rolling standard error of the mean (SEM). It calculates it to the edge of the vector with incrementally smaller window-size.

  117. eval_parse_kollapse():

    evaluate and parse (dyn_var_caller)

  118. lookup():

    Awesome pattern matching for a set of values in another set of values. Returns a list with all kinds of results.

  119. richColors():

    Alias for rich.colors in gplots

  120. Color_Check():

    Display the colors encoded by the numbers / color-ID-s you pass on to this function

  121. colSums.barplot():

    Draw a barplot from ColSums of a matrix.

  122. lm_equation_formatter():

    Renders the lm() function’s output into a human readable text. (e.g. for subtitles)

  123. lm_equation_formatter2():

    Renders the lm() function’s output into a human readable text. (e.g. for subtitles)

  124. lm_equation_formatter3():

    Renders the lm() function’s output into a human readable text. (e.g. for subtitles)

  125. hist.XbyY():

    Split a one variable by another. Calculates equal bins in splitby, and returns a list of the corresponding values in toSplit.

  126. flag.name_value():

    returns the name and its value, if its not FALSE.

  127. flag.nameiftrue():

    Returns the name and its value, if its TRUE.

  128. flag.names_list():

    Returns the name and value of each element in a list of parameters.

  129. param.list.flag():

    Returns the name and value of each element in a list of parameters.

  130. quantile_breaks():

    Quantile breakpoints in any data vector Source: slowkow.com.

  131. vec.fromNames():

    create a vector from a vector of names

  132. list.fromNames():

    create list from a vector with the names of the elements

  133. matrix.fromNames():

    Create a matrix from 2 vectors defining the row- and column names of the matrix. Default fill value: NA.

  134. matrix.fromVector():

    Create a matrix from values in a vector repeated for each column / each row. Similar to rowNameMatrix and colNameMatrix.

  135. array.fromNames():

    create an N-dimensional array from N vectors defining the row-, column, etc names of the array

  136. what():

    A better version of is(). It can print the first “printme” elements.

  137. idim():

    A dim() function that can handle if you pass on a vector: then, it gives the length.

  138. idimnames():

    A dimnames() function that can handle if you pass on a vector: it gives back the names.

  139. table_fixed_categories():

    generate a table() with a fixed set of categories. It fills up the table with missing categories, that are relevant when comparing to other vectors.

  140. stopif2():

    Stop script if the condition is met. You can parse anything (e.g. variables) in the message

  141. most_frequent_elements():

    Show the most frequent elements of a table

  142. top_indices():

    Returns the position / index of the n highest values. For equal values, it maintains the original order

  143. percentile2value():

    Calculate what is the actual value of the N-th percentile in a distribution or set of numbers. Useful for calculating cutoffs, and displaying them by whist()’s “vline” paramter.

  144. MaxN():

    find second (third…) highest/lowest value in vector

  145. hclust.getOrder.row():

    Extract ROW order from a pheatmap object.

  146. hclust.getOrder.col():

    Extract COLUMN order from a pheatmap object.

  147. hclust.getClusterID.row():

    Extract cluster ID’s for ROWS of a pheatmap object.

  148. hclust.getClusterID.col():

    Extract cluster ID’s for COLUMNS of a pheatmap object.

  149. hclust.ClusterSeparatingLines.row():

    Calculate the position of ROW separating lines between clusters in a pheatmap object.

  150. hclust.ClusterSeparatingLines.col():

    Calculate the position of COLUMN separating lines between clusters in a pheatmap object.

  151. Gap.Postions.calc.pheatmap():

    calculate gap positions for pheatmap, based a sorted annotation vector of categories

  152. matlabColors.pheatmap():

    Create a Matlab-like color gradient using “colorRamps”.

  153. annot_col.create.pheatmap.vec():

    For VECTORS. Auxiliary function for pheatmap. Prepares the 2 variables needed for “annotation_col” and “annotation_colors” in pheatmap

  154. annot_col.create.pheatmap.df():

    For data frames. Auxiliary function for pheatmap. Prepares the 2 variables needed for “annotation_col” and “annotation_colors” in pheatmap

  155. annot_col.fix.numeric():

    fix class and color annotation in pheatmap annotation data frame’s and lists.

  156. annot_row.create.pheatmap.df():

    For data frames. Auxiliary function for pheatmap. Prepares the 2 variables needed for “annotation_col” and “annotation_colors” in pheatmap

  157. wPairConnector():

    Connect Pairs of datapoints with a line on a plot.

  158. numerate():

    numerate from x to y with additonal zeropadding

  159. printEveryN():

    Report at every e.g. 1000

  160. zigzagger():

    mix entries so that they differ

  161. irequire():

    Load a package. If it does not exist, try to install it from CRAN.

  162. IfExistsAndTrue():

    Internal function. Checks if a variable is defined, and its value is TRUE.

  163. filter_InCircle():

    Find points in/out-side of a circle.

  164. cumsubtract():

    Cumulative subtraction, opposite of cumsum()

  165. trail():

    A combination of head() and tail() to see both ends.

  166. sort.decreasing():

    Sort in decreasing order.

  167. list.2.replicated.name.vec():

    Convert a list to a vector, with list elements names replicated as many times, as many elements each element had.

  168. idate():

    Parse current date, dot separated.

  169. view.head():

    view the head of an object by console.

  170. view.head2():

    view the head of an object by View().

  171. iidentical.names():

    Test if names of two objects for being exactly equal

  172. iidentical():

    Test if two objects for being exactly equal

  173. iidentical.all():

    Test if two objects for being exactly equal.

  174. parsepvalue():

    Parse p-value from a number to a string.

  175. shannon.entropy():

    Calculate shannon entropy

  176. id2titlecaseitalic():

    Convert a gene ID to title case italic

  177. id2titlecaseitalic.sp():

    Convert a gene ID to italic

  178. id2name():

    Convert a gene ID to a gene name (symbol). From / for RaceID.

  179. id2chr():

    Convert a gene ID to the chromosome. From / for RaceID.

  180. name2id():

    Convert an name to gene ID. From / for RaceID.

  181. name2id.toClipboard():

    Convert an name to gene ID, anc copy to clipboard. From / for RaceID.

  182. name2id.fast():

    Convert an name to gene ID. From / for RaceID.

  183. legend.col():

    Legend color. # Source: aurelienmadouasse.wordpress.com.

  184. copy.dimension.and.dimnames():

    copy dimension and dimnames

  185. mdlapply():

    lapply for multidimensional arrays

  186. arr.of.lists.2.df():

    simplify 2D-list-array to a DF

  187. mdlapply2df():

    multi dimensional lapply + arr.of.lists.2.df (simplify 2D-list-array to a DF)

  188. memory.biggest.objects():

    Show distribution of the largest objects and return their names

  189. na.omit.strip():

    Calls na.omit() and returns a clean vector

  190. md.LinkTable():

    Take a dataframe where every entry is a string containing an html link, parse and write out

  191. Parse google search query links to your list of gene symbols. Strings “prefix” and ““suffix” will be searched for together with each gene (“Human ID4 neurons”). See many additional services in DatabaseLinke.R.

  192. Parse bing search query links to your list of gene symbols. Strings “prefix” and ““suffix” will be searched for together with each gene (“Human ID4 neurons”). See many additional services in DatabaseLinke.R..

  193. val2col():

    This function converts a vector of values(“yourdata”) to a vector of color levels. One must define the number of colors. The limits of the color scale(“zlim”) or the break points for the color changes(“breaks”) can also be defined. When breaks and zlim are defined, breaks overrides zlim.

  194. as.logical.wNames():

    Converts your input vector into a logical vector, and puts the original character values into the names of the new vector, unless it already has names.

  195. iterBy.over():

    Iterate over a vector by every N-th element.

  196. sourcePartial():

    Source parts of another script. Source: stackoverflow.

  197. oo():

    Open current working directory.

  198. jjpegA4():

    Setup an A4 size jpeg

  199. param.list.2.fname():

    Take a list of parameters and parse a string from their names and values.

  200. GC_content():

    GC-content of a string (frequency of G and C letters among all letters).

  201. eucl.dist.pairwise():

    Calculate pairwise euclidean distance

  202. sign.dist.pairwise():

    Calculate absolute value of the pairwise euclidean distance

  203. reverse.list.hierarchy():

    reverse list hierarchy

  204. extPDF():

    add pdf as extension to a file name

  205. extPNG():

    add png as extension to a file name

  206. col2named.vec.tbl():

    Convert a 2-column table (data frame) into a named vector. 1st column will be used as names.


Get CodeAndRoll. Vertesy, 2020. DOI

If you use these functions, please star the repo, or cite via DOI. Thanks!