csverve.api package¶
Submodules¶
csverve.api.api module¶
- csverve.api.api.add_col_from_dict(infile, col_data, outfile, dtypes, skip_header=False, **kwargs)[source]¶
TODO: fill this in Add column to gzipped CSV.
- Parameters:
infile –
col_data –
outfile –
dtypes –
skip_header –
- Returns:
- csverve.api.api.annotate_csv(infile: str, annotation_df: DataFrame, outfile, annotation_dtypes, on='cell_id', skip_header: bool = False, **kwargs)[source]¶
TODO: fill this in :param infile: :param annotation_df: :param outfile: :param annotation_dtypes: :param on: :param skip_header: :return:
- csverve.api.api.concatenate_csv(inputfiles: List[str], output: str, skip_header: bool = False, drop_duplicates: bool = False, **kwargs) None[source]¶
Concatenate gzipped CSV files, dtypes in meta YAML files must be the same.
- Parameters:
inputfiles – List of gzipped CSV file paths, or a dictionary where the keys are file paths.
output – Path of resulting concatenated gzipped CSV file and meta YAML.
skip_header – boolean, True = write header, False = don’t write header.
- Returns:
- csverve.api.api.concatenate_csv_files_pandas(in_filenames: Union[List[str], Dict[str, str]], out_filename: str, dtypes: Dict[str, str], skip_header: bool = False, drop_duplicates: bool = False, **kwargs) None[source]¶
Concatenate gzipped CSV files.
- Parameters:
in_filenames – List of gzipped CSV file paths, or a dictionary where the keys are file paths.
out_filename – Path of resulting concatenated gzipped CSV file and meta YAML.
dtypes – Dictionary of pandas dtypes, where key = column name, value = dtype.
skip_header – boolean, True = write header, False = don’t write header.
- Returns:
- csverve.api.api.concatenate_csv_files_quick_lowmem(inputfiles: List[str], output: str, dtypes: Dict[str, str], columns: List[str], skip_header: bool = False, **kwargs) None[source]¶
Concatenate gzipped CSV files.
- Parameters:
inputfiles – List of gzipped CSV file paths.
output – Path of resulting concatenated gzipped CSV file and meta YAML.
dtypes – Dictionary of pandas dtypes, where key = column name, value = dtype.
columns – List of column names for newly concatenated gzipped CSV file.
skip_header – boolean, True = write header, False = don’t write header.
- Returns:
- csverve.api.api.merge_csv(in_filenames: Union[List[str], Dict[str, str]], out_filename: str, how: str, on: List[str], skip_header: bool = False, **kwargs) None[source]¶
Create one gzipped CSV out of multiple gzipped CSVs.
- Parameters:
in_filenames – Dictionary containing file paths as keys
out_filename – Path to newly merged CSV
how – How to join DataFrames (inner, outer, left, right).
on – Column(s) to join on, comma separated if multiple.
skip_header – boolean, True = write header, False = don’t write header
- Returns:
- csverve.api.api.read_csv(infile: str, chunksize: Optional[int] = None, usecols=None, dtype=None) DataFrame[source]¶
Read in CSV file and return as a pandas DataFrame.
Assumes a YAML meta file in the same path with the same name, with a .yaml extension. YAML file structure is atop this file.
- Parameters:
infile – Path to CSV file.
chunksize – Number of rows to read at a time (optional, applies to large datasets).
usecols – Restrict to specific columns (optional).
dtype – Override the dtypes on specific columns (optional).
- Returns:
pandas DataFrame.
- csverve.api.api.remove_duplicates(filepath: str, outputfile: str, skip_header: bool = False) None[source]¶
remove duplicate rows
Assumes a YAML meta file in the same path with the same name, with a .yaml extension. YAML file structure is atop this file.
- Parameters:
filepath – Path to CSV file.
outputfile – Path to CSV file.
- csverve.api.api.rewrite_csv_file(filepath: str, outputfile: str, skip_header: bool = False, dtypes: Optional[Dict[str, str]] = None, **kwargs) None[source]¶
Generate header less csv files.
- Parameters:
filepath – File path of CSV.
outputfile – File path of header less CSV to be generated.
skip_header – boolean, True = write header, False = don’t write header.
dtypes – Dictionary of pandas dtypes, where key = column name, value = dtype.
- Returns:
- csverve.api.api.simple_annotate_csv(in_f: str, out_f: str, col_name: str, col_val: str, col_dtype: str, skip_header: bool = False, **kwargs) None[source]¶
Simplified version of the annotate_csv method. Add column with the same value for all rows.
- Parameters:
in_f –
out_f –
col_name –
col_val –
col_dtype –
skip_header –
- Returns:
- csverve.api.api.write_dataframe_to_csv_and_yaml(df: DataFrame, outfile: str, dtypes: Dict[str, str], skip_header: bool = False, **kwargs) None[source]¶
Output pandas dataframe to a CSV and meta YAML files.
- Parameters:
df – pandas DataFrame.
outfile – Path of CSV & YAML file to be written to.
dtypes – dictionary of pandas dtypes by column, keys = column name, value = dtype.
skip_header – boolean, True = skip writing header, False = write header
- Returns: