Monday, June 3, 2024
 Popular · Latest · Hot · Upcoming
181
rated 0 times [  188] [ 7]  / answers: 1 / hits: 18175  / 11 Years ago, wed, august 21, 2013, 12:00:00

The problem: A website I am trying to gather data from uses Javascript to produce a graph. I'd like to be able to pull the data that is being used in the graph, but I am not sure where to start. For example, the data might be as follows:



var line1=
[[Wed, 12 Jun 2013 01:00:00 +0000,22.4916114807,2 sold],
[Fri, 14 Jun 2013 01:00:00 +0000,27.4950008392,2 sold],
[Sun, 16 Jun 2013 01:00:00 +0000,19.5499992371,1 sold],
[Tue, 18 Jun 2013 01:00:00 +0000,17.25,1 sold],
[Sun, 23 Jun 2013 01:00:00 +0000,15.5420341492,2 sold],
[Thu, 27 Jun 2013 01:00:00 +0000,8.79045295715,3 sold],
[Fri, 28 Jun 2013 01:00:00 +0000,10,1 sold]];


This is pricing data (Date, Price, Volume). I've found another question here - Parsing variable data out of a js tag using python - which suggests that I use JSON and BeautifulSoup, but I am unsure how to apply it to this particular problem because the formatting is slightly different. In fact, in this problem the code looks more like python than any type of JSON dictionary format.



I suppose I could read it in as a string, and then use XPATH and some funky string editing to convert it, but this seems like too much work for something that is already formatted as a Javascript variable.



So, what can I do here to pull this type of organized data from this variable while using python? (I am most familiar with python and BS4)


More From » python

 Answers
13

Okay, so there are a few ways to do it, but I ended up simply using a regular expression to find everything between line1= and ;



#Read page data as a string
pageData = sock.read()
#set p as regular expression
p = re.compile('(?<=line1=)(.*)(?=;)')
#find all instances of regular expression in pageData
parsed = p.findall(pageData)
#evaluate list as python code => turn into list in python
newParsed = eval(parsed[0])


Regex is nice when you have good coding, but is this method better (EDIT: or worse!) than any of the other answers here?



EDIT: I ultimately used the following:



#Read page data as a string
pageData = sock.read()
#set p as regular expression
p = re.compile('(?<=line1=)(.*)(?=;)')
#find all instances of regular expression in pageData
parsed = p.findall(pageData)
#load as JSON instead of using evaluate to prevent risky execution of unknown code
newParsed = json.loads(parsed[0])

[#76235] Tuesday, August 20, 2013, 11 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
gleng

Total Points: 471
Total Questions: 107
Total Answers: 102

Location: Virgin Islands (U.S.)
Member since Fri, May 7, 2021
3 Years ago
;