csverve.api package

Submodules

csverve.api.api module

csverve.api.api.add_col_from_dict(infile, col_data, outfile, dtypes, skip_header=False, **kwargs)[source]

TODO: fill this in Add column to gzipped CSV.

Parameters:
  • infile

  • col_data

  • outfile

  • dtypes

  • skip_header

Returns:

csverve.api.api.annotate_csv(infile: str, annotation_df: DataFrame, outfile, annotation_dtypes, on='cell_id', skip_header: bool = False, **kwargs)[source]

TODO: fill this in :param infile: :param annotation_df: :param outfile: :param annotation_dtypes: :param on: :param skip_header: :return:

csverve.api.api.concatenate_csv(inputfiles: List[str], output: str, skip_header: bool = False, drop_duplicates: bool = False, **kwargs) None[source]

Concatenate gzipped CSV files, dtypes in meta YAML files must be the same.

Parameters:
  • inputfiles – List of gzipped CSV file paths, or a dictionary where the keys are file paths.

  • output – Path of resulting concatenated gzipped CSV file and meta YAML.

  • skip_header – boolean, True = write header, False = don’t write header.

Returns:

csverve.api.api.concatenate_csv_files_pandas(in_filenames: Union[List[str], Dict[str, str]], out_filename: str, dtypes: Dict[str, str], skip_header: bool = False, drop_duplicates: bool = False, **kwargs) None[source]

Concatenate gzipped CSV files.

Parameters:
  • in_filenames – List of gzipped CSV file paths, or a dictionary where the keys are file paths.

  • out_filename – Path of resulting concatenated gzipped CSV file and meta YAML.

  • dtypes – Dictionary of pandas dtypes, where key = column name, value = dtype.

  • skip_header – boolean, True = write header, False = don’t write header.

Returns:

csverve.api.api.concatenate_csv_files_quick_lowmem(inputfiles: List[str], output: str, dtypes: Dict[str, str], columns: List[str], skip_header: bool = False, **kwargs) None[source]

Concatenate gzipped CSV files.

Parameters:
  • inputfiles – List of gzipped CSV file paths.

  • output – Path of resulting concatenated gzipped CSV file and meta YAML.

  • dtypes – Dictionary of pandas dtypes, where key = column name, value = dtype.

  • columns – List of column names for newly concatenated gzipped CSV file.

  • skip_header – boolean, True = write header, False = don’t write header.

Returns:

csverve.api.api.get_columns(infile)[source]
csverve.api.api.get_dtypes(infile)[source]
csverve.api.api.merge_csv(in_filenames: Union[List[str], Dict[str, str]], out_filename: str, how: str, on: List[str], skip_header: bool = False, **kwargs) None[source]

Create one gzipped CSV out of multiple gzipped CSVs.

Parameters:
  • in_filenames – Dictionary containing file paths as keys

  • out_filename – Path to newly merged CSV

  • how – How to join DataFrames (inner, outer, left, right).

  • on – Column(s) to join on, comma separated if multiple.

  • skip_header – boolean, True = write header, False = don’t write header

Returns:

csverve.api.api.read_csv(infile: str, chunksize: Optional[int] = None, usecols=None, dtype=None) DataFrame[source]

Read in CSV file and return as a pandas DataFrame.

Assumes a YAML meta file in the same path with the same name, with a .yaml extension. YAML file structure is atop this file.

Parameters:
  • infile – Path to CSV file.

  • chunksize – Number of rows to read at a time (optional, applies to large datasets).

  • usecols – Restrict to specific columns (optional).

  • dtype – Override the dtypes on specific columns (optional).

Returns:

pandas DataFrame.

csverve.api.api.remove_duplicates(filepath: str, outputfile: str, skip_header: bool = False) None[source]

remove duplicate rows

Assumes a YAML meta file in the same path with the same name, with a .yaml extension. YAML file structure is atop this file.

Parameters:
  • filepath – Path to CSV file.

  • outputfile – Path to CSV file.

csverve.api.api.rewrite_csv_file(filepath: str, outputfile: str, skip_header: bool = False, dtypes: Optional[Dict[str, str]] = None, **kwargs) None[source]

Generate header less csv files.

Parameters:
  • filepath – File path of CSV.

  • outputfile – File path of header less CSV to be generated.

  • skip_header – boolean, True = write header, False = don’t write header.

  • dtypes – Dictionary of pandas dtypes, where key = column name, value = dtype.

Returns:

csverve.api.api.simple_annotate_csv(in_f: str, out_f: str, col_name: str, col_val: str, col_dtype: str, skip_header: bool = False, **kwargs) None[source]

Simplified version of the annotate_csv method. Add column with the same value for all rows.

Parameters:
  • in_f

  • out_f

  • col_name

  • col_val

  • col_dtype

  • skip_header

Returns:

csverve.api.api.write_dataframe_to_csv_and_yaml(df: DataFrame, outfile: str, dtypes: Dict[str, str], skip_header: bool = False, **kwargs) None[source]

Output pandas dataframe to a CSV and meta YAML files.

Parameters:
  • df – pandas DataFrame.

  • outfile – Path of CSV & YAML file to be written to.

  • dtypes – dictionary of pandas dtypes by column, keys = column name, value = dtype.

  • skip_header – boolean, True = skip writing header, False = write header

Returns:

Module contents