If you use these functions, please star the repo, or cite via DOI
. Thanks!
CodeAndRoll
CodeAndRoll
is a collection of custom R functions. Works with MarkdownReports
, SeuratUtils
but also as a standalone set of more than 200 productivity tools.. Many other repos/libraries of mine may have dependency on these functions. Source: own work + web (source referenced in description and/or source code). Intended for my personal use, shared because others may find (parts of it) useful.
News
CodeAndRoll (v1) repository is decommissioned.
Use the packages below:
Install
1.) Download CodeAndRoll.R
, save as local .R
file, and source(~/path/to/CodeAndRoll.R)
:
2.) Directly source from the web:
source("https://raw.githubusercontent.com/vertesy/CodeAndRoll/master/CodeAndRoll.R")
Troubleshooting
If you encounter a bug, something doesn’t work or unclear, please let me know by raising an issue on CodeAndRoll – Please check if it has been asked.
Usage
After source("~/path/to/CodeAndRoll.R")
you can use any of the functions listed below. A part of the functions have a minimal example written in the .R
scripts, just below each functions definition.
Chapters
The script is roughly organised in the following sections / categories:
-
File handling, export, import [read & write]
-
Clipboard interaction (OS X)
-
Reading files in
-
Writing files out
-
Vector operations
-
Vector filtering
-
Matrix operations
-
List operations
-
Set operations
-
Math and stats
-
String operations
-
Plotting and Graphics
-
Read and write plotting functions READ
-
Generic
-
Plots
-
New additions
List of Functions
Note that this library is under continous development. Thus not all functions here may be still in CodeAndRoll.R, and vice versa, new functions in CodeAndRoll.R may not be listed here. Backward compatibility is most often, but not always taken care of. See other files in the repo if you are missing a function.
String operations
-
ppp()
:Paste by point
-
pps()
:Paste by (forward) slash
-
ppu()
:Paste by underscore
-
ppd()
:Paste by dash
-
kpp()
:kollapse by point
-
kppu()
:kollapse by underscore
-
kppd()
:kollapse by dash
-
stry()
:Silent try
-
say()
:Use system voice to notify (after a long task is done)
-
sayy()
:Use system voice to notify (after a long task is done)
-
grepv()
:grep returning the value.
-
unload()
:Unload a package. Source Stackoverflow.
-
clip2clip.vector()
:Copy from clipboard (e.g. excel) to a R-formatted vector to the clipboard
-
clip2clip.commaSepString()
:read a comma separated string (e.g. list of gene names) and properly format it for R.
-
read.simple.vec()
:Read each line of a file to an element of a vector (read in new-line separated values, no header!).
-
read.simple()
:It is essentially read.table() with file/path parsing.
-
read.simple_char_list()
:Read in a file.
-
read.simple.table()
:Read in a file. default: header defines colnames, no rownames. For rownames give the col nr. with rownames, eg. 1 The header should start with a TAB / First column name should be empty.
-
FirstCol2RowNames()
:Set First Col to Row Names
File handling, export, import
-
read.simple.tsv()
:Read in a file with excel style data: rownames in col1, headers SHIFTED. The header should start with a TAB / First column name should be empty.
-
read.simple.csv()
:Read in a file with excel style data: rownames in col1, headers SHIFTED. The header should start with a TAB / First column name should be empty.
-
read.simple.ssv()
:Space separeted values. Read in a file with excel style data: rownames in col1, headers SHIFTED. The header should start with a TAB / First column name should be empty.
-
read.simple.tsv.named.vector()
:Read in a file with excel style named vectors, names in col1, headers SHIFTED. The header should start with a TAB / First column name should be empty.
-
convert.tsv.data()
:Fix NA issue in dataframes imported by the new read.simple.tsv. Set na_rep to NA if you want to keep NA-s
-
read.simple.xls()
:Read multi-sheet excel files. row_namePos = NULL for automatic names
-
write.simple()
:Write out a matrix-like R-object to a file with as tab separated values (.tsv). Your output filename will be either the variable’s name. The output file will be located in “OutDir” specified by you at the beginning of the script, or under your current working directory. You can pass the PATH and VARIABLE separately (in order), they will be concatenated to the filename.
-
write.simple.vec()
:Write out a vector-like R-object to a file with as newline separated values (.vec). Your output filename will be either the variable’s name. The output file will be located in “OutDir” specified by you at the beginning of the script, or under your current working directory. You can pass the PATH and VARIABLE separately (in order), they will be concatenated to the filename.
-
write.simple.xlsx()
:Write out a list of matrices/ data frames WITH ROW- AND COLUMN- NAMES to a file with as an Excel (.xslx) file. Your output filename will be either the variable’s name. The output file will be located in “OutDir” specified by you at the beginning of the script, or under your current working directory. You can pass the PATH and VARIABLE separately (in order), they will be concatenated to the filename.
-
write.simple.append()
:Append an R-object WITHOUT ROWNAMES, to an existing .tsv file of the same number of columns. Your output filename will be either the variable’s name. The output file will be located in “OutDir” specified by you at the beginning of the script, or under your current working directory. You can pass the PATH and VARIABLE separately (in order), they will be concatenated to the filename.
-
sstrsplit()
:Alias for str_split_fixed in the stringr package
-
topN.dfCol()
:Find the n highest values in a named vector
-
bottomN.dfCol()
:Find the n lowest values in a named vector
-
as.named.vector()
:Convert a dataframe column or row into a vector, keeping the corresponding dimension name.
-
col2named.vector()
:Convert a dataframe column into a vector, keeping the corresponding dimension name.
-
row2named.vector()
:Convert a dataframe row into a vector, keeping the corresponding dimension name.
-
as.numeric.wNames()
:Converts any vector into a numeric vector, and puts the original character values into the names of the new vector, unless it already has names. Useful for coloring a plot by categories, name-tags, etc.
-
as.numeric.wNames.old()
:Converts any vector into a numeric vector, and puts the original character values into the names of the new vector, unless it already has names. Useful for coloring a plot by categories, name-tags, etc.
-
as.character.wNames()
:Converts your input vector into a character vector, and puts the original character values into the names of the new vector, unless it already has names.
-
rescale()
:linear transformation to a given range of values
-
flip_value2name()
:Flip the values and the names of a vector with names
-
sortbyitsnames()
:Sort a vector by the alphanumeric order of its names (instead of its values).
-
any.duplicated()
:How many entries are duplicated
-
which.duplicated()
:orig =rownames(sc@expdata)
-
which.NA()
:orig =rownames(sc@expdata)
-
which_names()
:Return the names where the input vector is TRUE. The input vector is converted to logical.
-
which_names_grep()
:Return the vector elements whose names are partially matched
-
na.omit.mat()
:Omit rows with NA values from a matrix. Rows with any, or full of NA-s
-
inf.omit()
:Omit infinite values from a vector.
-
zero.omit()
:Omit zero values from a vector.
-
pc_TRUE()
:Percentage of true values in a logical vector, parsed as text (useful for reports.)
-
NrAndPc()
:Summary stat. text formatting for logical vectors (%, length)
-
pc_in_total_of_match()
:Percentage of a certain value within a vector or table.
-
filter_survival_length()
:Parse a sentence reporting the % of filter survival.
-
remove_outliers()
:Remove values that fall outside the trailing N % of the distribution.
-
simplify_categories()
:Replace every entry that is found in “replaceit”, by a single value provided by “to”
-
rotate()
:rotate a matrix 90 degrees.
-
sortEachColumn()
:Sort each column of a numeric matrix / data frame.
-
rowMedians()
:Calculates the median of each row of a numeric matrix / data frame.
-
colMedians()
:Calculates the median of each column of a numeric matrix / data frame.
-
rowGeoMeans()
:Calculates the median of each row of a numeric matrix / data frame.
-
colGeoMeans()
:Calculates the median of each column of a numeric matrix / data frame.
-
rowCV()
:Calculates the CV of each ROW of a numeric matrix / data frame.
-
colCV()
:Calculates the CV of each column of a numeric matrix / data frame.
-
rowVariance()
:Calculates the CV of each ROW of a numeric matrix / data frame.
-
colVariance()
:Calculates the CV of each column of a numeric matrix / data frame.
-
rowMin()
:Calculates the minimum of each row of a numeric matrix / data frame.
-
colMin()
:Calculates the minimum of each column of a numeric matrix / data frame.
-
rowMax()
:Calculates the maximum of each row of a numeric matrix / data frame.
-
colMax()
:Calculates the maximum of each column of a numeric matrix / data frame.
-
rowSEM()
:Calculates the SEM of each row of a numeric matrix / data frame.
-
colSEM()
:Calculates the SEM of each column of a numeric matrix / data frame.
-
rowSD()
:Calculates the SEM of each row of a numeric matrix / data frame.
-
colSD()
:Calculates the SEM of each column of a numeric matrix / data frame.
-
rowIQR()
:Calculates the SEM of each row of a numeric matrix / data frame.
-
colIQR()
:Calculates the SEM of each column of a numeric matrix / data frame.
-
rowquantile()
:Calculates the SEM of each row of a numeric matrix / data frame.
-
colquantile()
:Calculates the SEM of each column of a numeric matrix / data frame.
-
row.Zscore()
:Calculate Z-score over rows of data frame.
-
rowACF()
:RETURNS A LIST. Calculates the autocorrelation of each row of a numeric matrix / data frame.
-
colACF()
:RETURNS A LIST. Calculates the autocorrelation of each row of a numeric matrix / data frame.
-
acf.exactLag()
:Autocorrelation with exact lag
-
rowACF.exactLag()
:RETURNS A Vector for the “lag” based autocorrelation. Calculates the autocorrelation of each row of a numeric matrix / data frame.
-
colACF.exactLag()
:RETURNS A Vector for the “lag” based autocorrelation. Calculates the autocorrelation of each row of a numeric matrix / data frame.
-
colDivide()
:divide by column
-
rowDivide()
:divide by row
-
sort.mat()
:Sort a matrix. ALTERNATIVE: dd[with(dd, order(-z, b)), ]. Source: stackoverflow.
-
rowNameMatrix()
:Create a copy of your matrix, where every entry is replaced by the corresponding row name. Useful if you want to color by row name in a plot (where you have different number of NA-values in each row).
-
colNameMatrix()
:Create a copy of your matrix, where every entry is replaced by the corresponding column name. Useful if you want to color by column name in a plot (where you have different number of NA-values in each column).
-
colsplit()
:split a data frame by a factor corresponding to columns.
-
rowsplit()
:split a data frame by a factor corresponding to columns.
-
TPM_normalize()
:normalize each column to 1 million
-
median_normalize()
:normalize each column to the median of all the column-sums
-
mean_normalize()
:normalize each column to the median of the columns
-
rownames.trimws()
:trim whitespaces from the rownames
-
select.rows.and.columns()
:Subset rows and columns. It checks if the selected dimension names exist and reports if any of those they aren’t found.
-
getRows()
:Get the subset of rows with existing rownames, report how much it could not find.
-
getCols()
:Get the subset of cols with existing colnames, report how much it could not find.
-
get.oddoreven()
:Get odd or even columns or rows of a data frame
-
combine.matrices.intersect()
:combine matrices by rownames intersect
-
merge_dfs_by_rn()
:Merge any data frames by rownames. Required plyr package
-
merge_numeric_df_by_rn()
:Merge 2 numeric data frames by rownames
-
attach_w_rownames()
:Take a data frame (of e.g. metadata) from your memory space, split it into vectors so you can directly use them. E.g.: Instead of metadata$color[blabla] use color[blabla]
-
panel.cor.pearson()
:A function to display correlation values for pairs() function. Default is pearson correlation, that can be set to “kendall” or “spearman”.
-
panel.cor.spearman()
:A function to display correlation values for pairs() function. Default is pearson correlation, that can be set to “kendall” or “spearman”.
-
remove.na.rows()
:cols have to be a vector of numbers corresponding to columns
-
remove.na.cols()
:cols have to be a vector of numbers corresponding to columns
-
intersect.ls()
:Intersect any number of lists.
-
union.ls()
:Intersect any number of list elements. Faster than reduce.
-
unlapply()
:lapply, then unlist
-
list.wNames()
:create a list with names from ALL variables you pass on to the function
-
as.list.df.by.row()
:Split a dataframe into a list by its columns. omit.empty for the listelments; na.omit and zero.omit are applied on entries inside each list element.
-
as.list.df.by.col()
:oSplit a dataframe into a list by its rows. omit.empty for the listelments; na.omit and zero.omit are applied on entries inside each list element.
-
reorder.list()
:reorder elements of lists in your custom order of names / indices.
-
range.list()
:range of values in whole list
-
intermingle2lists()
:Combine 2 lists (of the same length) so that form every odd and every even element of a unified list. Useful for side-by-side comparisons, e.g. in wstripchart_list().
-
as.listalike()
:convert a vector to a list with certain dimensions, taken from the list it wanna resemble
-
list2fullDF.byNames()
:Convert a list to a full matrix. Rows = names(union.ls(your_list)) or all names of within list elements, columns = names(your_list).
-
list2fullDF.presence()
:Convert a list to a full matrix. Designed for occurence counting, think tof table(). Rows = all ENTRIES of within your list, columns = names(your_list).
-
splitbyitsnames()
:split a list by its names
-
splititsnames_byValues()
:split a list by its names
-
intermingle2vec()
:Combine 2 vectors (of the same length) so that form every odd and every even element of a unified vector.
-
intermingle.cbind()
:Combine 2 data frames (of the same length) so that form every odd and every even element of a unified list. Useful for side-by-side comparisons, e.g. in wstripchart_list().
-
pad.na()
:Fill up with a vector to a given length with NA-values at the end.
-
clip.values()
:Signal clipping. Cut values above or below a threshold.
-
clip.outliers()
:Signal clipping based on the input data’s distribution. It clips values above or below the extreme N% of the distribution.
-
ls2categvec()
:Convert a list to a vector repeating list-element names, while vector names are the list elements
-
symdiff()
:Quasy symmetric difference of any number of vectors
-
sem()
:Calculates the standard error of the mean (SEM) for a numeric vector (it excludes NA-s by default)
-
fano()
:Calculates the fano factor on a numeric vector (it excludes NA-s by default)
-
geomean()
:Calculates the geometric mean of a numeric vector (it excludes NA-s by default)
-
mean_of_log()
:Calculates the mean of the log_k of a numeric vector (it excludes NA-s by default)
-
movingAve()
:Calculates the moving / rolling average of a numeric vector.
-
movingAve2()
: -
movingSEM()
:Calculates the moving / rolling standard error of the mean (SEM) on a numeric vector.
-
imovingSEM()
:Calculates the moving / rolling standard error of the mean (SEM). It calculates it to the edge of the vector with incrementally smaller window-size.
-
eval_parse_kollapse()
:evaluate and parse (dyn_var_caller)
-
lookup()
:Awesome pattern matching for a set of values in another set of values. Returns a list with all kinds of results.
-
richColors()
:Alias for rich.colors in gplots
-
Color_Check()
:Display the colors encoded by the numbers / color-ID-s you pass on to this function
-
colSums.barplot()
:Draw a barplot from ColSums of a matrix.
-
lm_equation_formatter()
:Renders the lm() function’s output into a human readable text. (e.g. for subtitles)
-
lm_equation_formatter2()
:Renders the lm() function’s output into a human readable text. (e.g. for subtitles)
-
lm_equation_formatter3()
:Renders the lm() function’s output into a human readable text. (e.g. for subtitles)
-
hist.XbyY()
:Split a one variable by another. Calculates equal bins in splitby, and returns a list of the corresponding values in toSplit.
-
flag.name_value()
:returns the name and its value, if its not FALSE.
-
flag.nameiftrue()
:Returns the name and its value, if its TRUE.
-
flag.names_list()
:Returns the name and value of each element in a list of parameters.
-
param.list.flag()
:Returns the name and value of each element in a list of parameters.
-
quantile_breaks()
:Quantile breakpoints in any data vector Source: slowkow.com.
-
vec.fromNames()
:create a vector from a vector of names
-
list.fromNames()
:create list from a vector with the names of the elements
-
matrix.fromNames()
:Create a matrix from 2 vectors defining the row- and column names of the matrix. Default fill value: NA.
-
matrix.fromVector()
:Create a matrix from values in a vector repeated for each column / each row. Similar to rowNameMatrix and colNameMatrix.
-
array.fromNames()
:create an N-dimensional array from N vectors defining the row-, column, etc names of the array
-
what()
:A better version of is(). It can print the first “printme” elements.
-
idim()
:A dim() function that can handle if you pass on a vector: then, it gives the length.
-
idimnames()
:A dimnames() function that can handle if you pass on a vector: it gives back the names.
-
table_fixed_categories()
:generate a table() with a fixed set of categories. It fills up the table with missing categories, that are relevant when comparing to other vectors.
-
stopif2()
:Stop script if the condition is met. You can parse anything (e.g. variables) in the message
-
most_frequent_elements()
:Show the most frequent elements of a table
-
top_indices()
:Returns the position / index of the n highest values. For equal values, it maintains the original order
-
percentile2value()
:Calculate what is the actual value of the N-th percentile in a distribution or set of numbers. Useful for calculating cutoffs, and displaying them by whist()’s “vline” paramter.
-
MaxN()
:find second (third…) highest/lowest value in vector
-
hclust.getOrder.row()
:Extract ROW order from a pheatmap object.
-
hclust.getOrder.col()
:Extract COLUMN order from a pheatmap object.
-
hclust.getClusterID.row()
:Extract cluster ID’s for ROWS of a pheatmap object.
-
hclust.getClusterID.col()
:Extract cluster ID’s for COLUMNS of a pheatmap object.
-
hclust.ClusterSeparatingLines.row()
:Calculate the position of ROW separating lines between clusters in a pheatmap object.
-
hclust.ClusterSeparatingLines.col()
:Calculate the position of COLUMN separating lines between clusters in a pheatmap object.
-
Gap.Postions.calc.pheatmap()
:calculate gap positions for pheatmap, based a sorted annotation vector of categories
-
matlabColors.pheatmap()
:Create a Matlab-like color gradient using “colorRamps”.
-
annot_col.create.pheatmap.vec()
:For VECTORS. Auxiliary function for pheatmap. Prepares the 2 variables needed for “annotation_col” and “annotation_colors” in pheatmap
-
annot_col.create.pheatmap.df()
:For data frames. Auxiliary function for pheatmap. Prepares the 2 variables needed for “annotation_col” and “annotation_colors” in pheatmap
-
annot_col.fix.numeric()
:fix class and color annotation in pheatmap annotation data frame’s and lists.
-
annot_row.create.pheatmap.df()
:For data frames. Auxiliary function for pheatmap. Prepares the 2 variables needed for “annotation_col” and “annotation_colors” in pheatmap
-
wPairConnector()
:Connect Pairs of datapoints with a line on a plot.
-
numerate()
:numerate from x to y with additonal zeropadding
-
printEveryN()
:Report at every e.g. 1000
-
zigzagger()
:mix entries so that they differ
-
irequire()
:Load a package. If it does not exist, try to install it from CRAN.
-
IfExistsAndTrue()
:Internal function. Checks if a variable is defined, and its value is TRUE.
-
filter_InCircle()
:Find points in/out-side of a circle.
-
cumsubtract()
:Cumulative subtraction, opposite of cumsum()
-
trail()
:A combination of head() and tail() to see both ends.
-
sort.decreasing()
:Sort in decreasing order.
-
list.2.replicated.name.vec()
:Convert a list to a vector, with list elements names replicated as many times, as many elements each element had.
-
idate()
:Parse current date, dot separated.
-
view.head()
:view the head of an object by console.
-
view.head2()
:view the head of an object by View().
-
iidentical.names()
:Test if names of two objects for being exactly equal
-
iidentical()
:Test if two objects for being exactly equal
-
iidentical.all()
:Test if two objects for being exactly equal.
-
parsepvalue()
:Parse p-value from a number to a string.
-
shannon.entropy()
:Calculate shannon entropy
-
id2titlecaseitalic()
:Convert a gene ID to title case italic
-
id2titlecaseitalic.sp()
:Convert a gene ID to italic
-
id2name()
:Convert a gene ID to a gene name (symbol). From / for RaceID.
-
id2chr()
:Convert a gene ID to the chromosome. From / for RaceID.
-
name2id()
:Convert an name to gene ID. From / for RaceID.
-
name2id.toClipboard()
:Convert an name to gene ID, anc copy to clipboard. From / for RaceID.
-
name2id.fast()
:Convert an name to gene ID. From / for RaceID.
-
legend.col()
:Legend color. # Source: aurelienmadouasse.wordpress.com.
-
copy.dimension.and.dimnames()
:copy dimension and dimnames
-
mdlapply()
:lapply for multidimensional arrays
-
arr.of.lists.2.df()
:simplify 2D-list-array to a DF
-
mdlapply2df()
:multi dimensional lapply + arr.of.lists.2.df (simplify 2D-list-array to a DF)
-
memory.biggest.objects()
:Show distribution of the largest objects and return their names
-
na.omit.strip()
:Calls na.omit() and returns a clean vector
-
md.LinkTable()
:Take a dataframe where every entry is a string containing an html link, parse and write out
-
link_google()
:Parse google search query links to your list of gene symbols. Strings “prefix” and ““suffix” will be searched for together with each gene (“Human ID4 neurons”). See many additional services in DatabaseLinke.R.
-
link_bing()
:Parse bing search query links to your list of gene symbols. Strings “prefix” and ““suffix” will be searched for together with each gene (“Human ID4 neurons”). See many additional services in DatabaseLinke.R..
-
val2col()
:This function converts a vector of values(“yourdata”) to a vector of color levels. One must define the number of colors. The limits of the color scale(“zlim”) or the break points for the color changes(“breaks”) can also be defined. When breaks and zlim are defined, breaks overrides zlim.
-
as.logical.wNames()
:Converts your input vector into a logical vector, and puts the original character values into the names of the new vector, unless it already has names.
-
iterBy.over()
:Iterate over a vector by every N-th element.
-
sourcePartial()
:Source parts of another script. Source: stackoverflow.
-
oo()
:Open current working directory.
-
jjpegA4()
:Setup an A4 size jpeg
-
param.list.2.fname()
:Take a list of parameters and parse a string from their names and values.
-
GC_content()
:GC-content of a string (frequency of G and C letters among all letters).
-
eucl.dist.pairwise()
:Calculate pairwise euclidean distance
-
sign.dist.pairwise()
:Calculate absolute value of the pairwise euclidean distance
-
reverse.list.hierarchy()
:reverse list hierarchy
-
extPDF()
:add pdf as extension to a file name
-
extPNG()
:add png as extension to a file name
-
col2named.vec.tbl()
:Convert a 2-column table (data frame) into a named vector. 1st column will be used as names.
Get CodeAndRoll. Vertesy, 2020.
If you use these functions, please star the repo, or cite via DOI
. Thanks!