csverve.api package¶

Submodules¶

csverve.api.api module¶

csverve.api.api.add_col_from_dict(infile, col_data, outfile, dtypes, skip_header=False, **kwargs)[source]¶

TODO: fill this in Add column to gzipped CSV.

Parameters:

infile –
col_data –
outfile –
dtypes –
skip_header –

Returns:

csverve.api.api.annotate_csv(infile: str, annotation_df: DataFrame, outfile, annotation_dtypes, on='cell_id', skip_header: bool = False, **kwargs)[source]¶: TODO: fill this in :param infile: :param annotation_df: :param outfile: :param annotation_dtypes: :param on: :param skip_header: :return:

csverve.api.api.concatenate_csv(inputfiles: List[str], output: str, skip_header: bool = False, drop_duplicates: bool = False, **kwargs) → None[source]¶

Concatenate gzipped CSV files, dtypes in meta YAML files must be the same.

Parameters:

inputfiles – List of gzipped CSV file paths, or a dictionary where the keys are file paths.
output – Path of resulting concatenated gzipped CSV file and meta YAML.
skip_header – boolean, True = write header, False = don’t write header.

Returns:

csverve.api.api.concatenate_csv_files_pandas(in_filenames: Union[List[str], Dict[str, str]], out_filename: str, dtypes: Dict[str, str], skip_header: bool = False, drop_duplicates: bool = False, **kwargs) → None[source]¶

Concatenate gzipped CSV files.

Parameters:

in_filenames – List of gzipped CSV file paths, or a dictionary where the keys are file paths.
out_filename – Path of resulting concatenated gzipped CSV file and meta YAML.
dtypes – Dictionary of pandas dtypes, where key = column name, value = dtype.
skip_header – boolean, True = write header, False = don’t write header.

Returns:

csverve.api.api.concatenate_csv_files_quick_lowmem(inputfiles: List[str], output: str, dtypes: Dict[str, str], columns: List[str], skip_header: bool = False, **kwargs) → None[source]¶

Concatenate gzipped CSV files.

Parameters:

inputfiles – List of gzipped CSV file paths.
output – Path of resulting concatenated gzipped CSV file and meta YAML.
dtypes – Dictionary of pandas dtypes, where key = column name, value = dtype.
columns – List of column names for newly concatenated gzipped CSV file.
skip_header – boolean, True = write header, False = don’t write header.

Returns:

csverve.api.api.get_columns(infile)[source]¶

csverve.api.api.get_dtypes(infile)[source]¶

csverve.api.api.merge_csv(in_filenames: Union[List[str], Dict[str, str]], out_filename: str, how: str, on: List[str], skip_header: bool = False, **kwargs) → None[source]¶

Create one gzipped CSV out of multiple gzipped CSVs.

Parameters:

in_filenames – Dictionary containing file paths as keys
out_filename – Path to newly merged CSV
how – How to join DataFrames (inner, outer, left, right).
on – Column(s) to join on, comma separated if multiple.
skip_header – boolean, True = write header, False = don’t write header

Returns:

csverve.api.api.read_csv(infile: str, chunksize: Optional[int] = None, usecols=None, dtype=None) → DataFrame[source]¶

Read in CSV file and return as a pandas DataFrame.

Assumes a YAML meta file in the same path with the same name, with a .yaml extension. YAML file structure is atop this file.

Parameters:

infile – Path to CSV file.
chunksize – Number of rows to read at a time (optional, applies to large datasets).
usecols – Restrict to specific columns (optional).
dtype – Override the dtypes on specific columns (optional).

Returns:

pandas DataFrame.

csverve.api.api.remove_duplicates(filepath: str, outputfile: str, skip_header: bool = False) → None[source]¶

remove duplicate rows

Assumes a YAML meta file in the same path with the same name, with a .yaml extension. YAML file structure is atop this file.

Parameters:

filepath – Path to CSV file.
outputfile – Path to CSV file.

csverve.api.api.rewrite_csv_file(filepath: str, outputfile: str, skip_header: bool = False, dtypes: Optional[Dict[str, str]] = None, **kwargs) → None[source]¶

Generate header less csv files.

Parameters:

filepath – File path of CSV.
outputfile – File path of header less CSV to be generated.
skip_header – boolean, True = write header, False = don’t write header.
dtypes – Dictionary of pandas dtypes, where key = column name, value = dtype.

Returns:

csverve.api.api.simple_annotate_csv(in_f: str, out_f: str, col_name: str, col_val: str, col_dtype: str, skip_header: bool = False, **kwargs) → None[source]¶

Simplified version of the annotate_csv method. Add column with the same value for all rows.

Parameters:

in_f –
out_f –
col_name –
col_val –
col_dtype –
skip_header –

Returns:

csverve.api.api.write_dataframe_to_csv_and_yaml(df: DataFrame, outfile: str, dtypes: Dict[str, str], skip_header: bool = False, **kwargs) → None[source]¶

Output pandas dataframe to a CSV and meta YAML files.

Parameters:

df – pandas DataFrame.
outfile – Path of CSV & YAML file to be written to.
dtypes – dictionary of pandas dtypes by column, keys = column name, value = dtype.
skip_header – boolean, True = skip writing header, False = write header

Returns:

csverve.api package¶

Submodules¶

csverve.api.api module¶

Module contents¶