1

I have 1,000 files; the start of each file all look like this:

!dataset_description = Analysis of POF D119 mutation. 
!dataset_type = Expression profiling by array
!dataset_pubmed_id = 17318176
!dataset_platform = GPL1322

The aim: I want to transform this information into a list so I can make an excel spreadsheet between all the files; i.e. I want the list to look like this:

[Analysis_of_POF_D119_mutation,Expression_profiling_by_array,17318176,GPL1322]

I have this code (this is just to extract the first variable, "!dataset_description", however, I would subsequently run the code on each variable of interest i.e. !dataset_type, !dataset_pubmed_id, !dataset_platform):

OpenDataset = open(sys.argv[1], 'r')
Dataset = OpenDataset.readlines()
ListOfInformation = []
 formatted_line = lambda x: "_".join(line.strip().split("=")[x].split())
 for line in Dataset:
 if line.startswith("!dataset_description"):
 description = formatted_line(1)
 print description

The code works, however, I am now at a stage where I understand python basics, and I want to start coding more "pythonically". I have two questions.

  1. It seems silly to use the lambda expression that I am using. "x" in the lambda expression will always be 1, since I will always want what comes after the "=" sign. Therefore x isn't really a "variable", but then I can't have a lambda expression without a variable.

I tried to change the variable to being what the line starts with, which is the true variable, doing something like this:

formatted_line = lambda x: "_".join(line.strip().split("=")[1].split()) if line.startswith(x)

However, this code returns a syntax error.

Would someone know how to make the above lambda expression work.

  1. These files have the potential to be really really big. However, the information that I need is at the start of the file, and all start with the "!" symbol. So it seems silly to read in the whole file, when I'll just need X number of lines at the start of the file, all of which start with "!" (the exact number of lines per file will be variable). Is there a way to read in just the lines starting with "!"; or is it quicker just to use file.readlines().
asked Jul 18, 2016 at 11:07
6
  • 1
    Why are you passing 1 always? Pass the line instead. Commented Jul 18, 2016 at 11:09
  • 2
    "then I can't have a lambda expression without a variable" -- sure you can. Just don't put in a variable. Commented Jul 18, 2016 at 11:09
  • Lambda "expression"s should produce a value, just like any other expression. In the last lambda version, what will be the result of the expression if the line doesn't start with x? That is why it produces a Syntax error. Commented Jul 18, 2016 at 11:11
  • 1
    when the lambda starts getting too long, consider writing a named function, especially since you already assign your lambda to a variable (thus, with a name) Commented Jul 18, 2016 at 11:14
  • What λuser said. Using a lambda instead of def for a named function is generally considered bad style in Python, although that rule is sometimes bent, eg when creating a key function that's used as an arg to sort or sorted and then immediately re-used as an arg to itertools.groupby. Apart from brevity, lambdas have no advantage over full function definitions, but they have several disadvantages. So you should only use them when a simple anonymous function is appropriate. Commented Jul 18, 2016 at 11:34

2 Answers 2

2

You certainly can have a lambda expression without an argument.

However, in this case, you should actually pass an argument: the line itself. That is the thing that you're operating on, therefore it should be passed into the function.

Your if statement does not work because an inline if in Python must always have an else clause. In this case the value in else is the empty string.

So:

formatted_line = lambda line: "_".join(line.strip().split("=")[1].split()) if line.startswith(x) else ""

If you only want to read values until the lines stop starting with !, you can use itertools.takewhile:

from itertools import takewhile
...
for line in takewhile(lambda line: line.startswith("!"), Dataset):
answered Jul 18, 2016 at 11:12
Sign up to request clarification or add additional context in comments.

Comments

2

It raises SyntaxError, because you're missing an else branch. The "expression if" or "inline if" has the syntax: <value to return when True> if <condition> else <value when False> You can't use elif.

So the code might look like this:

formatted_line = lambda x: "_".join(line.strip().split("=")[1].split()) if line.startswith(x) else "" # You can replace this with `None`.
answered Jul 18, 2016 at 11:13

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.